Created 08-09-2016 11:58 PM
I am in process of switching to Oozie for our periodic tasks. I have started with a regular workflow and finally got it working. My workflow.xml is like this:
<workflow-app xmlns="uri:oozie:workflow:0.5" name="clean-opens"> <start to="mr-clean-opens" /> <action name="mr-clean-opens"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>oozie.action.sharelib.for.map-reduce</name> <value>zookeeper,hbase,myapp</value> </property> </configuration> <config-class>x.y.z.CleanOpensProcessConfiguration</config-class> </map-reduce> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>CleanOpensProcess failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end" /> </workflow-app>
Note that "hbase", "zookeeper" and "myapp" are my own sharelibs. First two are obvious, the last one is, actually, my application JAR. I have a bunch of MR jobs in that jar so I was thinking about just making it a sharelib and using it from different workflows.
I run it using the following property file:
jobTracker=hadoop-node02.stage.mydomain.net:8050 nameNode=hdfs://hadoop-node01.stage.mydomain.net:8020 applicationPath=${nameNode}/apps/myapp queueName=default mapreduce.framework.name=yarn oozie.wf.application.path=${applicationPath} oozie.use.system.libpath=true
This WORKS. I have workflow.xml in /apps/myapp, MR job runs fine.
Now I want to switch to coordinator-driven execution, once an hour. I have created this coordinator.xml:
<coordinator-app name="clean-opens-coordinator" start="2016-08-01T00:00Z" end="2020-06-01T00:00Z" frequency="10 * * * *" timezone="America/Montreal" xmlns="uri:oozie:coordinator:0.4"> <action> <workflow> <app-path>${applicationPath}</app-path> <configuration> <property> <name>oozie.action.sharelib.for.map-reduce</name> <value>zookeeper,hbase,myapp</value> </property> <property> <name>queueName</name> <value>${queueName}</value> </property> </configuration> </workflow> </action> </coordinator-app>
I have placed this XML in /apps/myapp-coordinator/coordinator.xml and tried to use this coordinator.properties to deploy that job:
jobTracker=hadoop-node02.stage.mydomain.net:8050 nameNode=hdfs://hadoop-node01.stage.mydomain.net:8020 applicationPath=${nameNode}/apps/myapp coordinatorPath=${nameNode}/apps/myapp-coordinator/coordinator.xml queueName=default oozie.action.sharelib.for.map-reduce=zookeeper,hbase,myapp mapreduce.framework.name=yarn oozie.coord.application.path=${coordinatorPath}
When the job runs, I get ClassNotFoundException for my x.y.z.CleanOpensProcessConfiguration class. As you can see, I have even tried to populate this "oozie.action.sharelib.for.map-reduce" property in multiple files without any luck.
Again, launching the workflow works fine, but launching it through coordinator does not work because it cannot find the class it needs. I am wondering if there is something I am missing about how coordinator launches the workflow. Will appreciate any suggestions.
Created 08-10-2016 12:28 AM
Can you please let us know if you are using same workflow.xml for this coordinator, I see that you have set below property in your coorinator.xml
<property> <name>oozie.action.sharelib.for.map-reduce</name> <value>zookeeper,hbase,myapp</value> </property>
This property will work for your workflow.xml as you have 'map-reduce' action in it ( notice action name after for in your property )
e.g. - oozie.action.sharelib.for.<action-name>
Also please add below property in your job.propeties
oozie.use.system.libpath=true
Other alternative way is to place required jars in ${applicationPath}/lib directory. those jars will get picked up by Oozie automatically.
Please do let us know if it still fails with provided info.
Created 08-10-2016 12:28 AM
Can you please let us know if you are using same workflow.xml for this coordinator, I see that you have set below property in your coorinator.xml
<property> <name>oozie.action.sharelib.for.map-reduce</name> <value>zookeeper,hbase,myapp</value> </property>
This property will work for your workflow.xml as you have 'map-reduce' action in it ( notice action name after for in your property )
e.g. - oozie.action.sharelib.for.<action-name>
Also please add below property in your job.propeties
oozie.use.system.libpath=true
Other alternative way is to place required jars in ${applicationPath}/lib directory. those jars will get picked up by Oozie automatically.
Please do let us know if it still fails with provided info.
Created 08-10-2016 12:43 AM
Yes, I am using exactly the same workflow.xml in the same location for both tests - when running just the workflow itself (oozie job -run -debug -config workflow.properties) or through coordinator (oozie job -run -debug -config coordinator.properties). So I am not interested at all copying any JARs to the lib directory - that's the whole point of the sharelib to avoid that 🙂
I have added "oozie.use.system.libpath=true" to my coordinator.properties (config to launch coordinator job) and it seems exactly what was missing! I see, sharelib is not added to the classpath without having this property set to true.
Thanks!
Created 08-10-2016 04:31 AM
Glad to see that it worked! Please let us know in case of any further issues. HCC is always there for you! 🙂