About zaher_mahdhi

zaher_mahdhi · ‎12-14-2017

Thank you @Matt Andruff for your reply. I resolved the issue. I had another .jar in the /lib directory containing the same code but with another file name. I'm not sure how it does affect the execution of the job. But after removing it every thing works fine, for now at least.

zaher_mahdhi · ‎12-13-2017

Hi, I've a prolem with running a jar using an oozie shell action in a kerberized cluster. My jar has the following code for authentification: Configuration conf = new Configuration(); conf.set("hadoop.security.authentication","kerberos"); UserGroupInformation.setConfiguration(conf); try { UserGroupInformation.loginUserFromKeytab(principal, keytabPath); } catch (IOException e) { e.printStackTrace(); } My workflow.xml as following: <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>hadoop</exec> <argument>jar</argument> <argument>jarfile</argument> <argument>x.x.x.x.UnzipFile</argument> <argument>keytab</argument> <argument>${kerberosPrincipal}</argument> <argument>${nameNode}</argument> <argument>${zipFilePath}</argument> <argument>${unzippingDir}</argument> <env-var>HADOOP_USER_NAME=${wf:user()}</env-var> <file>${workdir}/lib/[keytabFileName]#keytab</file> <file>${workdir}/lib/[JarFileName]#jarfile</file> </shell> The jar file and the keytab are located in HDFS in the /lib directory of the directory where the .xml is located. The problem is that on various identical run of the oozie workflow I sometime get this error: java.io.IOException: Incomplete HDFS URI, no host: hdfs://[name_bode_URI]:8020keytab at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:154) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390) at x.x.x.x.CompressedFilesUtilities.unzip(CompressedFilesUtilities.java:54) at x.x.x.x.UnzipFile.main(UnzipFile.java:13) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

zaher_mahdhi · ‎08-03-2016

Okay, I found a workaround, I added: -Duser.timezone=GMT which changes the the JVM timezon. The final Flume-ng command will be as following: flume-ng agent --conf-file spool1.properties --name agent1 --conf $FLUME_HOME/conf -Duser.timezone=GMT The needed directory for the oozie coordiantor is now being created.

zaher_mahdhi · ‎08-03-2016

Hi all, I've created an Oozie coordinator with synchronous dataset. The time in the cluster is set to CEST (GMT+2). I'm using flume to collect data and create a directory in HDFS in this format: /flume/%Y/%m/%d/%H coordinator.properties: nameNode=hdfs://vm1.local:8020 jobTracker=vm1.local:8050 queueName=default exampleDir=${nameNode}/user/root/oozie-wait oozie.use.system.libpath = true start=2016-08-03T08:01Z end=2016-08-03T12:06Z workflowAppUri=${exampleDir}/app oozie.coord.application.path=${exampleDir}/app coordiantor.xml: <coordinator-app name="every-hour-waitForData" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1"> <datasets> <dataset name="ratings" frequency="${coord:hours(1)}" initial-instance="${start}" timezone="Europe/Paris"> <uri-template>hdfs://vm1.local:8020/user/root/flume/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> </dataset> </datasets> <input-events> <data-in name="coordInput1" dataset="ratings"> <instance>${coord:current(0)}</instance> </data-in> </input-events> <action> <workflow> <app-path>${workflowAppUri}</app-path> <configuration> <property> <name>wfInput</name> <value>${coord:dataIn('coordInput1')}</value> </property> <property> <name>jobTracker</name> <value>${jobTracker}</value> </property> <property> <name>nameNode</name> <value>${nameNode}</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action> </coordinator-app> When running this example flume creates the directory /user/root/flume/2016/08/03/10/ But the coordinator is waiting for /user/root/flume/2016/08/03/08 Does any one knows how to make Flume creates the directory in UTC or the coordinator reads the correct directory . Thanks.

zaher_mahdhi · ‎07-27-2016

Thank you @Michael M and @Alexander Bij for your valuable help.

zaher_mahdhi · ‎07-26-2016

Problem solved, I changed the channel type from file to memory agent1.channels.channel2.type = memory Answers about how to make it work with a channel type file are welcome.

zaher_mahdhi · ‎07-26-2016

Hi, I'm using Flume to collect data from a Spool Directory. My configuration is as follows: agent1.sources = source1 agent1.sinks = sink1 agent1.channels = channel2 agent1.sources.source1.channels = channel2 agent1.sinks.sink1.channel = channel2 agent1.sources.source1.type = spooldir agent1.sources.source1.basenameHeader = true agent1.sources.source1.spoolDir = /root/flume_example/spooldir agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path = /user/root/flume agent1.sinks.sink1.hdfs.filePrefix = %{basename} agent1.sinks.sink1.hdfs.fileSuffix = .csv agent1.sinks.sink1.hdfs.idleTimeout = 5 agent1.sinks.sink1.hdfs.rollSize = 0 agent1.sinks.sink1.hdfs.rollCount = 100000 agent1.sinks.sink1.hdfs.fileType = DataStream agent1.channels.channel2.type = file When placing 43MB file in spooldir, flume starts writing files into HDFS Directory /user/root/flume: -rw-r--r-- 3 root hdfs 7.9 M 2016-07-26 11:10 /user/root/flume/filename.csv.1469524239209.csv -rw-r--r-- 3 root hdfs 7.6 M 2016-07-26 11:11 /user/root/flume/filename.csv.1469524239210.csv But a java.lang.OutOfMemoryError: Java heap space error is raised: ERROR channel.ChannelProcessor: Error while writing to required channel: FileChannel channel2 { dataDirs: [/root/.flume/file-channel/data] } java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:703) at java.util.HashMap.putVal(HashMap.java:662) at java.util.HashMap.put(HashMap.java:611) at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338) at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287) at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317) at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211) at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553) at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192) at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/07/26 11:10:59 ERROR source.SpoolDirectorySource: FATAL: Spool Directory source source1: { spoolDir: /root/flume_example/spooldir }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing. java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:703) at java.util.HashMap.putVal(HashMap.java:662) at java.util.HashMap.put(HashMap.java:611) at org.apache.flume.channel.file.EventQueueBackingStoreFile.put(EventQueueBackingStoreFile.java:338) at org.apache.flume.channel.file.FlumeEventQueue.set(FlumeEventQueue.java:287) at org.apache.flume.channel.file.FlumeEventQueue.add(FlumeEventQueue.java:317) at org.apache.flume.channel.file.FlumeEventQueue.addTail(FlumeEventQueue.java:211) at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doCommit(FileChannel.java:553) at org.apache.flume.channel.BasicTransactionSemantics.commit(BasicTransactionSemantics.java:151) at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:192) at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Any idea how can I fix this issue ? Thanks.

zaher_mahdhi · ‎07-20-2016

Okay, I installed the NodeManger on the 3 remaining nodes and I have now all the nodes active.

zaher_mahdhi · ‎07-20-2016

Yep, Only one .. Can I add new ones ?

zaher_mahdhi · ‎07-20-2016

Hi, I have a cluster with 4 nodes (NameNode: 8Gb RAM, 3 Data nodes with 4GB RAM). In the Ressource Manager UI i'm getting only one Active Node: Is this Normal ? Thanks.

Online	Offline
Last Visited	‎04-02-2019 03:03 PM

Member Since	‎02-08-2016 11:50 PM
Last Visited	‎04-02-2019 03:03 PM
Posts	36
Kudos received	18

Cloudera Community

Re: Problem running a jar in kerberized cluster

Re: Oozie coordinator timezone

Re: java.lang.OutOfMemoryError: Java heap space wi...

Re: Running Mahout examples problem

Re: Problem running a jar in kerberized cluster

Problem running a jar in kerberized cluster

Re: Oozie coordinator timezone

Oozie coordinator timezone

Re: java.lang.OutOfMemoryError: Java heap space wi...

Re: java.lang.OutOfMemoryError: Java heap space wi...

java.lang.OutOfMemoryError: Java heap space wih Fl...

Re: Only one Node in Active Stat ??

Re: Only one Node in Active Stat ??

Only one Node in Active Stat ??