Member since 
    
	
		
		
		02-17-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                40
            
            
                Posts
            
        
                25
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3268 | 01-31-2017 04:47 AM | |
| 3673 | 07-26-2016 05:46 PM | |
| 9035 | 05-02-2016 10:12 AM | 
			
    
	
		
		
		08-03-2017
	
		
		06:37 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This was back in 2016, nowadays I would go for Nifi (open source) or  StreamSets (free to use, pay for support)  Flume is deprecated in Hortonworks now and will be removed from in future releases 3.*: deprecations_HDP. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-31-2017
	
		
		06:18 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I had a similar problem. I had enabled the agent_tls, but the keystore field was not filled or the file was on a different location.  Now the server did not start anymore. I needed to rollback the setting, thx for your post.     I used mysql tool on the command-line to connect as root to MySQL db, and executed an update:     use scm;
update CONFIGS set VALUE='false' where ATTR='agent_tls';
Query OK, 1 row affected (0.05 sec)  After a restart of cloudera-scm-server, the server was working again and I could enter the UI. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-31-2017
	
		
		04:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 When I used the FullyQualifiedDomainName (with a '.' in it) the repo is working fine!      parcelRepositories: ["http://localrepo.cdh-cluster.internal/parcels/cdh5/", "http://localrepo.cdh-cluster.internal/parcels/spark2/"] 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-14-2016
	
		
		10:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							I’ll try that out this week. And let you know!  Thx for your advice.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-13-2016
	
		
		02:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Localrepo synced latest version from:  - ClouderaDirector  - ClouderaManager     Also serving parcels:  - CDH  - spark2     Bootstrap config:  cloudera-manager {    ...    repository: "http://localrepo/cloudera-manager/"    repositoryKeyUrl: "http://localrepo/cloudera-manager/RPM-GPG-KEY-cloudera"  }    ...  cluster {   products {   CDH: 5   }     parcelRepositories: ["http://localrepo/parcels/cdh5/", "http://localrepo/parcels/spark2/"]   ...  }     We start with cloudera-director-client bootstrap-remote with the config file.  The ClouderaDirector provisioning: ClouderaManager, datanodes, masters are created. But script failes at around step 870/900.     No errors in ClouderaManager logs, error appears in ClouderaDirector log, getting something from an empty-collection when building some Repo-list.     Bootstrap remote with a config file end with failed state:  /var/log/cloudera-director-server/application.log        [2016-12-13 10:00:53] INFO  [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: >> BootstrapClouderaManagerAgent$HostInstall/4 [DeploymentContext{environment=Environment{n
ame='DataLake-devtst', provider=InstanceProviderConfig{t ...
[2016-12-13 10:00:53] ERROR [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
java.util.NoSuchElementException: null
        at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154)
        at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:307)
        at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:284)
        at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.getRepoUrl(BootstrapClouderaManagerAgent.java:325)
        at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.newApiHostInstallArguments(BootstrapClouderaManagerAgent.java:307)
        at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.access$200(BootstrapClouderaManagerAgent.java:63)
        at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:162)
        at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:112)     Is this a bug? Or am I doing somthing wrong?     Local repo looks like this, and works fine for installing ClouderaDirector:  [root@localrepo mirror]# ls -ARls | grep /  ./cloudera-director:  ./cloudera-director/repodata:  ./cloudera-director/RPMS:  ./cloudera-director/RPMS/x86_64:  ./cloudera-director/RPMS/x86_64/repodata:  ./cloudera-manager:  ./cloudera-manager/repodata:  ./cloudera-manager/RPMS:  ./cloudera-manager/RPMS/x86_64:  ./cloudera-manager/RPMS/x86_64/repodata:  ./parcels:  ./parcels/cdh5:  ./parcels/spark2:    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		09-07-2016
	
		
		10:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 As @Jean-Philippe Player mentions read Parquet directory as tables its not yet supported by Hive. Source: http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html. You are able to do it in Impala:  # Using Impala:
CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
  STORED AS PARQUET
  LOCATION '/user/etl/destination';
  With some spark/scala code you can generate the create table statement based on a parquet file:  spark.read.parquet("/user/etl/destination/datafile1.dat").registerTempTable("mytable")
val df = sqlContext.sql("describe mytable")
// "colname (space) data-type"
val columns = df.map(row => row(0) + " " + row(1)).collect()
// Print the Hive create table statement:
println("CREATE EXTERNAL TABLE mytable")
println(s"  (${columns.mkString(", ")})")
println("STORED AS PARQUET ")
println("LOCATION '/user/etl/destination/datafile1.dat';")
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-27-2016
	
		
		11:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Junichi Oda,  We have the same error in the Ranger log, even when the groupnames are filled:  ERROR LdapUserGroupBuilder [UnixUserSyncThread] - sink.addOrUpdateUser failed with exception: org/apache/commons/httpclient/URIException, for user: userX, groups: [groupX, groupY]  I have inspected the sourcecode from ranger-0.6 which is part of HDP-2.4.3.0 our current version of the stack.  Interesting enough all calls to remote server inside LdapUserGroupBuilder.addOrUpdateUser(user, groups) are wrapped in a try-catch(Exception e). There is addUser, addUserGroupInfo and delXUserGroupInfo. But we don't see that in the log. The addOrUpdateUser is wrapped with try-catch(Throwable t). Looks like its an Error not an Exception!  I found this RANGER-804 ticket revering to missing classes. I copied the jars in '/usr/hdp/current/ranger-usersync/lib' from another folder. The code runs but I have a Certificate PKI error at the moment because we use LDAPS, but looks like this might get you further.  Greetings, Alexander 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2016
	
		
		06:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Zaher,  Depending on your data you should care about the channel you choose. The memory-channel is simple and easy, but data is lost when the Flume-agent crashes (OutOfMemory) most likely, or power/hardware-issues also likely... There are channels with higher durability for your data. The filechannel is very durable when underlaying storage is redundant as well.   Take a look at the flume-channels and there configuration options.  For your OutOfMem-problem you can decrease the transaction and batch capacity and increase the heap in the flume-env config in Ambari as @Michael Miklavcic suggests. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2016
	
		
		05:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 We manage our Flume-agents in Ambari. We have 3 'data-ingres'-nodes of many nodes. These nodes are bundled in a ConfigGroup, which are located at the top in Ambari > Flume > config with the name 'dataLoaders'.  The default flume.conf is empty, for the config-group 'dataLoaders' we override the default and add 2 agents:   Pulling data from a Queue and put it in Kafka + HDFS  Receiving JSON and placing it on a Kafka-topic.   Each host in the config-group will run the 2 agents, which can be restarted separately from the Ambari-flume summary page. When you have changed the config, it is traceable/audited in Ambari. A restart from Ambari will place the new config file for the flumes. Ambari-agent on the Flume host will inspect if the process is running and Alarm you when its dead. Ambari will help you when upgrading stack to latest version(s).  notes:   You cannot put a host in multiple config groups. (don't mix responsibilities)  The configuration is in plain text and no validation at all. (start and check /var/log/flume/**.log)
  Rolling restart for a config group is not supported (restart flume-agents 1 by 1)  Ambari 'alive'-checks are super simple, locked-up agent is running, but not working...  Ambari Flume data insight charts are too simple, (Grafana coming, or use JMXExporter -> Prometheus)  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













