Member since 
    
	
		
		
		02-08-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                36
            
            
                Posts
            
        
                18
            
            
                Kudos Received
            
        
                4
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1937 | 12-14-2017 03:09 PM | |
| 2885 | 08-03-2016 02:49 PM | |
| 5377 | 07-26-2016 10:52 AM | |
| 5534 | 03-07-2016 12:47 PM | 
			
    
	
		
		
		12-14-2017
	
		
		03:09 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you @Matt Andruff for your reply.  I resolved the issue. I had another .jar in the /lib directory containing the same code but with another file name. I'm not sure how it does affect the execution of the job. But after removing it every thing works fine, for now at least. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-13-2017
	
		
		02:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  I've a prolem with running a jar using an oozie shell action in a kerberized cluster.  My jar has the following code for authentification:  		Configuration conf = new Configuration();
		conf.set("hadoop.security.authentication","kerberos");
		UserGroupInformation.setConfiguration(conf);
		try {
			UserGroupInformation.loginUserFromKeytab(principal, keytabPath);
		} catch (IOException e) {
			e.printStackTrace();
		}  My workflow.xml as following:  <shell xmlns="uri:oozie:shell-action:0.1">
           		 <job-tracker>${resourceManager}</job-tracker>
           			 <name-node>${nameNode}</name-node>
           		 <configuration>
                		<property>
                 		 	<name>mapred.job.queue.name</name>
                  			<value>${queueName}</value>
               		 </property>
           		 </configuration>
           		 <exec>hadoop</exec>
           		 <argument>jar</argument>
			<argument>jarfile</argument>
			<argument>x.x.x.x.UnzipFile</argument>
			<argument>keytab</argument>
			<argument>${kerberosPrincipal}</argument>
			<argument>${nameNode}</argument>
			<argument>${zipFilePath}</argument>
			<argument>${unzippingDir}</argument>
			
			
			<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
			<file>${workdir}/lib/[keytabFileName]#keytab</file>
			<file>${workdir}/lib/[JarFileName]#jarfile</file>
			
        	</shell>  The jar file and the keytab are located in HDFS in the /lib directory of the directory where the .xml is  located.  The problem is that on various identical run of the oozie workflow I sometime get this error:  java.io.IOException: Incomplete HDFS URI, no host: hdfs://[name_bode_URI]:8020keytab
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
    at x.x.x.x.CompressedFilesUtilities.unzip(CompressedFilesUtilities.java:54)
    at x.x.x.x.UnzipFile.main(UnzipFile.java:13)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Oozie
 
			
    
	
		
		
		08-03-2016
	
		
		02:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Okay, I found a workaround, I added:  -Duser.timezone=GMT  which changes the the JVM timezon. The final Flume-ng command will be as following:  flume-ng agent --conf-file spool1.properties --name agent1 --conf $FLUME_HOME/conf -Duser.timezone=GMT  The needed directory for the oozie coordiantor is now being created. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2016
	
		
		08:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi all,  I've created an Oozie coordinator with synchronous dataset. The time in the cluster  is set to CEST (GMT+2). I'm using flume to collect data and create a directory in HDFS in this format:   /flume/%Y/%m/%d/%H  coordinator.properties:  nameNode=hdfs://vm1.local:8020
jobTracker=vm1.local:8050
queueName=default
exampleDir=${nameNode}/user/root/oozie-wait
oozie.use.system.libpath = true
start=2016-08-03T08:01Z
end=2016-08-03T12:06Z
workflowAppUri=${exampleDir}/app
oozie.coord.application.path=${exampleDir}/app  coordiantor.xml:  <coordinator-app name="every-hour-waitForData" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.1">
<datasets>    
<dataset name="ratings" frequency="${coord:hours(1)}" initial-instance="${start}" timezone="Europe/Paris">
    <uri-template>hdfs://vm1.local:8020/user/root/flume/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
  </dataset>
</datasets>
<input-events>
      <data-in name="coordInput1" dataset="ratings">
        <instance>${coord:current(0)}</instance>
      </data-in>
   </input-events>
        <action>
        <workflow>
            <app-path>${workflowAppUri}</app-path>
            <configuration>
        <property>
              <name>wfInput</name>
              <value>${coord:dataIn('coordInput1')}</value>
            </property>
                <property>
                    <name>jobTracker</name>
                    <value>${jobTracker}</value>
                </property>
                <property>
                    <name>nameNode</name>
                    <value>${nameNode}</value>
                </property>
                <property>
                    <name>queueName</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
        </workflow>
    </action>
</coordinator-app>
  When running this example flume creates the directory   /user/root/flume/2016/08/03/10/  But the coordinator is waiting for /user/root/flume/2016/08/03/08  Does any one knows how to make Flume creates the directory in UTC or the coordinator reads the correct directory .   Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Flume
 - 
						
							
		
			Apache Oozie
 
			
    
	
		
		
		07-27-2016
	
		
		09:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you @Michael M and @Alexander Bij for your valuable help. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2016
	
		
		10:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Problem solved, I changed the channel type from file to memory  agent1.channels.channel2.type = memory  Answers about how to make it work with a channel type file are welcome. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-26-2016
	
		
		09:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  I'm using Flume to collect data from a Spool Directory. My configuration is as follows:  agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel2
agent1.sources.source1.channels = channel2
agent1.sinks.sink1.channel = channel2
agent1.sources.source1.type = spooldir
agent1.sources.source1.basenameHeader = true
agent1.sources.source1.spoolDir = /root/flume_example/spooldir
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/root/flume
agent1.sinks.sink1.hdfs.filePrefix = %{basename}
agent1.sinks.sink1.hdfs.fileSuffix = .csv
agent1.sinks.sink1.hdfs.idleTimeout = 5
agent1.sinks.sink1.hdfs.rollSize = 0
agent1.sinks.sink1.hdfs.rollCount = 100000
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel2.type = file		  When placing 43MB file in spooldir, flume starts writing files into HDFS Directory /user/root/flume:   -rw-r--r--   3 root hdfs      7.9 M 2016-07-26 11:10 /user/root/flume/filename.csv.1469524239209.csv
-rw-r--r--   3 root hdfs      7.6 M 2016-07-26 11:11 /user/root/flume/filename.csv.1469524239210.csv  But a java.lang.OutOfMemoryError: Java heap space error is raised:  ERROR channel.ChannelProcessor: Error while writing to required channel: FileChannel channel2 { dataDirs: [/root/.flume/file-channel/data] }
java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.resize(HashMap.java:703)
    at java.util.HashMap.putVal(HashMap.java:662)
    at java.util.HashMap.put(HashMap.java:611)
    at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
    at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
    at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
    at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
    at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
    at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
    at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
    at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/07/26 11:10:59 ERROR source.SpoolDirectorySource: FATAL: Spool Directory source source1: { spoolDir: /root/flume_example/spooldir }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.resize(HashMap.java:703)
    at java.util.HashMap.putVal(HashMap.java:662)
    at java.util.HashMap.put(HashMap.java:611)
    at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338)
    at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287)
    at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317)
    at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211)
    at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553)
    at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151)
    at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192)
    at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
  Any idea how can I fix this issue ?  Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Flume
 
			
    
	
		
		
		07-20-2016
	
		
		11:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Okay, I installed the NodeManger on the 3 remaining nodes and I have now all the nodes active.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-20-2016
	
		
		10:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi, I have a cluster with 4 nodes (NameNode: 8Gb RAM, 3 Data nodes with 4GB RAM).  In the Ressource Manager UI i'm getting only one Active Node:      Is this Normal ?  Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Cloudera Manager