Created 08-18-2016 02:38 AM
I have HDP 2.4 installed on CentOS 6.8 on 6 virtual KVM instances based on two physical machines. I have been having problems with oozie jobs where the work flows call either hive or spark based actions. The error that I have encountered is
2016-08-18 11:01:18,134 WARN ActionStartXCommand:523 - SERVER[hc1m1.nec.co.nz] USER[oozie] GROUP[-] TOKEN[] APP[PoleLocationsForNec] JOB[0000001-160818105046419-oozie-oozi-W] ACTION[0000001-160818105046419-oozie-oozi-W@hive-select-data] Error starting action [hive-select-data]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.] org.apache.oozie.action.ActionExecutorException: JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
I have encountered this error on jobs that did already work so I cant think what has changed. My work flow looks like this
<workflow-app name='PoleLocationsForNec' xmlns="uri:oozie:workflow:0.5" xmlns:sla="uri:oozie:sla:0.2"> <start to='hive-select-data'/> <action name="hive-select-data"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <script>select_pole_locations_for_nec.hql</script> </hive> <ok to="hdfs-move-file"/> <error to="fail"/> </action> <action name="hdfs-move-file"> <fs> <move source='${sourceFile}' target='${targetFile}${SaveDateString}'/> </fs> <ok to="sqoop-copy"/> <error to="fail"/> </action> <action name="sqoop-copy"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <arg>export</arg> <arg>--options-file</arg> <arg>${sqoopOptFile}</arg> <file>${sqoopOptFile}</file> </sqoop> <ok to="cleanup"/> <error to="fail"/> </action> <action name="cleanup"> <fs> <delete path='${triggerFileDir}' /> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name='fail'> <message> An error occurred - message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end'/> </workflow-app>
and my job configuration looks like this.
<configuration> <property> <name>hdfsUser</name> <value>oozie</value> </property> <property> <name>WaitForThisInputData</name> <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/trigger/</value> </property> <property> <name>wfPath</name> <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec/</value> </property> <property> <name>user.name</name> <value>oozie</value> </property> <property> <name>sqoopOptFile</name> <value>sqoop_options_file.opt</value> </property> <property> <name>oozie.coord.application.path</name> <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value> </property> <property> <name>sourceFile</name> <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/Pole_locations_for_NEC_edit.csv</value> </property> <property> <name>mapreduce.job.user.name</name> <value>oozie</value> </property> <property> <name>Execution</name> <value>FIFO</value> </property> <property> <name>triggerFileDir</name> <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/trigger/</value> </property> <property> <name>coordFreq</name> <value>30</value> </property> <property> <name>Concurrency</name> <value>1</value> </property> <property> <name>targetFile</name> <value>/mule/sheets/store/Pole_locations_for_NEC_edit.csv</value> </property> <property> <name>jobTracker</name> <value>hc1r1m2.nec.co.nz:8050</value> </property> <property> <name>startTime</name> <value>2016-07-31T12:01Z</value> </property> <property> <name>wfProject</name> <value>PoleLocationsForNec</value> </property> <property> <name>targetDir</name> <value>/mule/sheets/store/</value> </property> <property> <name>dataFreq</name> <value>30</value> </property> <property> <name>nameNode</name> <value>hdfs://hc1m1.nec.co.nz:8020</value> </property> <property> <name>doneFlag</name> <value>done_flag.dat</value> </property> <property> <name>oozie.libpath</name> <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/share/lib</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> <property> <name>oozie.wf.rerun.failnodes</name> <value>true</value> </property> <property> <name>moveFile</name> <value>Pole_locations_for_NEC_edit.csv</value> </property> <property> <name>SaveDateString</name> <value>-20160817-230100</value> </property> <property> <name>triggerDir</name> <value>trigger/</value> </property> <property> <name>sourceDir</name> <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/</value> </property> <property> <name>oozie.wf.application.path</name> <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value> </property> <property> <name>endTime</name> <value>2099-01-01T12:00Z</value> </property> <property> <name>TimeOutMins</name> <value>10</value> </property> <property> <name>timeZoneDef</name> <value>GMT+12:00</value> </property> <property> <name>workflowAppPath</name> <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value> </property> </configuration>I have obviously researched this issue and understand that it related to the definition
of mapreduce.framework.name or the hdfs / resource manager server address. But given that this job has worked in the past I thought that this error might be masking another issue. ?? The value of mapreduce.framework.name is defined in the following files
/etc/hadoop/conf/mapred-site.xml => yarn-tez /etc/hive/conf/mapred-site.xml => yarn-tez /etc/oozie/conf/hadoop-config.xml => yarn /etc/pig/conf/pig-env.sh => yarn
I have checked all of the logs but all I see is the JA009 error in the oozie logs. I just wondering whether anyone else has encountered this error or can suggest some other area that I can examine.
Created 08-18-2016 02:46 AM
Value of below properties looks odd to me
It should be only 'yarn'
Can you please change it to yarn and restart required services via Ambari and see if it helps.
Rest of the configuration looks good to me.
Created 08-18-2016 02:46 AM
Value of below properties looks odd to me
It should be only 'yarn'
Can you please change it to yarn and restart required services via Ambari and see if it helps.
Rest of the configuration looks good to me.
Created 08-18-2016 04:01 AM
I think that the yarn-tez value is set by default because tez is integrated into HDP 2.4 and the tez install guide at
https://tez.apache.org/install.html in step 7. states that this value shoudl be set to yarn-tez. Il try the change, you never know there might be a conflict.
Created 08-18-2016 06:26 AM
@Mike Frampton - Sure. Please keep us posted.
Created 08-19-2016 02:18 AM
I changed the entries in the files listed above from yarn-tez to yarn and bounced oozie, hive and yarn. No luck. Then I decided to remove yarn using the ambari rest interface. I didnt manage to do that but I had stopped oozie to do it. On restarting Oozie this problem was cleared and my Oozie workflows now execute. Strange ..
Created 08-19-2016 02:32 AM
@Mike Frampton - If you restart Oozie first before Yarn then there is a chance that first restart still read old configs. Then you restarted Yarn and Hive which deployed updated hive-site.xml and yarn-site.xml/mapred-site.xml. Next Oozie restart got the updated config files. I could be wrong but this is one of the possible reason.
Created 08-29-2016 01:30 AM
@Mike Frampton - Is this resolved? If yes then can you please accept the appropriate answer?