Support Questions

Find answers, ask questions, and share your expertise

HDP 2.4 oozie work flow JA009: Cannot initialize Cluster error

avatar

I have HDP 2.4 installed on CentOS 6.8 on 6 virtual KVM instances based on two physical machines. I have been having problems with oozie jobs where the work flows call either hive or spark based actions. The error that I have encountered is

2016-08-18 11:01:18,134  WARN ActionStartXCommand:523 - SERVER[hc1m1.nec.co.nz] USER[oozie] GROUP[-] TOKEN[] APP[PoleLocationsForNec] JOB[0000001-160818105046419-oozie-oozi-W] ACTION[0000001-160818105046419-oozie-oozi-W@hive-select-data] Error starting action [hive-select-data]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.]
org.apache.oozie.action.ActionExecutorException: JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.

I have encountered this error on jobs that did already work so I cant think what has changed. My work flow looks like this

<workflow-app name='PoleLocationsForNec' 
  xmlns="uri:oozie:workflow:0.5"
  xmlns:sla="uri:oozie:sla:0.2">  
  <start to='hive-select-data'/>
  <action name="hive-select-data">
  <hive xmlns="uri:oozie:hive-action:0.2">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <script>select_pole_locations_for_nec.hql</script>
  </hive>
  <ok to="hdfs-move-file"/>
  <error to="fail"/>
  </action>
  <action name="hdfs-move-file">
  <fs>
  <move source='${sourceFile}' target='${targetFile}${SaveDateString}'/>
  </fs>
  <ok to="sqoop-copy"/>
  <error to="fail"/>
  </action>
  <action name="sqoop-copy">
  <sqoop xmlns="uri:oozie:sqoop-action:0.2">
  <job-tracker>${jobTracker}</job-tracker>
  <name-node>${nameNode}</name-node>
  <arg>export</arg>
  <arg>--options-file</arg>
  <arg>${sqoopOptFile}</arg>
  <file>${sqoopOptFile}</file>
  </sqoop>
  <ok to="cleanup"/>
  <error to="fail"/>
  </action>
  <action name="cleanup">
  <fs>
  <delete path='${triggerFileDir}' />
  </fs>
  <ok to="end"/>
  <error to="fail"/>
  </action>
  <kill name='fail'>
  <message>  An error occurred - message[${wf:errorMessage(wf:lastErrorNode())}] 
  </message>
  </kill>
  <end name='end'/>
</workflow-app>

and my job configuration looks like this.

<configuration>
  <property>
  <name>hdfsUser</name>
  <value>oozie</value>
  </property>
  <property>
  <name>WaitForThisInputData</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/trigger/</value>
  </property>
  <property>
  <name>wfPath</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec/</value>
  </property>
  <property>
  <name>user.name</name>
  <value>oozie</value>
  </property>
  <property>
  <name>sqoopOptFile</name>
  <value>sqoop_options_file.opt</value>
  </property>
  <property>
  <name>oozie.coord.application.path</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value>
  </property>
  <property>
  <name>sourceFile</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/Pole_locations_for_NEC_edit.csv</value>
  </property>
  <property>
  <name>mapreduce.job.user.name</name>
  <value>oozie</value>
  </property>
  <property>
  <name>Execution</name>
  <value>FIFO</value>
  </property>
  <property>
  <name>triggerFileDir</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/trigger/</value>
  </property>
  <property>
  <name>coordFreq</name>
  <value>30</value>
  </property>
  <property>
  <name>Concurrency</name>
  <value>1</value>
  </property>
  <property>
  <name>targetFile</name>
  <value>/mule/sheets/store/Pole_locations_for_NEC_edit.csv</value>
  </property>
  <property>
  <name>jobTracker</name>
  <value>hc1r1m2.nec.co.nz:8050</value>
  </property>
  <property>
  <name>startTime</name>
  <value>2016-07-31T12:01Z</value>
  </property>
  <property>
  <name>wfProject</name>
  <value>PoleLocationsForNec</value>
  </property>
  <property>
  <name>targetDir</name>
  <value>/mule/sheets/store/</value>
  </property>
  <property>
  <name>dataFreq</name>
  <value>30</value>
  </property>
  <property>
  <name>nameNode</name>
  <value>hdfs://hc1m1.nec.co.nz:8020</value>
  </property>
  <property>
  <name>doneFlag</name>
  <value>done_flag.dat</value>
  </property>
  <property>
  <name>oozie.libpath</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/share/lib</value>
  </property>
  <property>
  <name>oozie.use.system.libpath</name>
  <value>true</value>
  </property>
  <property>
  <name>oozie.wf.rerun.failnodes</name>
  <value>true</value>
  </property>
  <property>
  <name>moveFile</name>
  <value>Pole_locations_for_NEC_edit.csv</value>
  </property>
  <property>
  <name>SaveDateString</name>
  <value>-20160817-230100</value>
  </property>
  <property>
  <name>triggerDir</name>
  <value>trigger/</value>
  </property>
  <property>
  <name>sourceDir</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/mule/sheets/input/PoleLocationsForNec/</value>
  </property>
  <property>
  <name>oozie.wf.application.path</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value>
  </property>
  <property>
  <name>endTime</name>
  <value>2099-01-01T12:00Z</value>
  </property>
  <property>
  <name>TimeOutMins</name>
  <value>10</value>
  </property>
  <property>
  <name>timeZoneDef</name>
  <value>GMT+12:00</value>
  </property>
  <property>
  <name>workflowAppPath</name>
  <value>hdfs://hc1m1.nec.co.nz:8020/user/oozie/wf/PoleLocationsForNec</value>
  </property>
</configuration>
I have obviously researched this issue and understand that it related to the definition

of mapreduce.framework.name or the hdfs / resource manager server address. But given that this job has worked in the past I thought that this error might be masking another issue. ?? The value of mapreduce.framework.name is defined in the following files

/etc/hadoop/conf/mapred-site.xml  => yarn-tez
/etc/hive/conf/mapred-site.xml  => yarn-tez
/etc/oozie/conf/hadoop-config.xml  => yarn
/etc/pig/conf/pig-env.sh  => yarn

I have checked all of the logs but all I see is the JA009 error in the oozie logs. I just wondering whether anyone else has encountered this error or can suggest some other area that I can examine.

1 ACCEPTED SOLUTION

avatar
Master Guru
@Mike Frampton

Value of below properties looks odd to me

  1. /etc/hadoop/conf/mapred-site.xml => yarn-tez
  2. /etc/hive/conf/mapred-site.xml => yarn-tez

It should be only 'yarn'

Can you please change it to yarn and restart required services via Ambari and see if it helps.

Rest of the configuration looks good to me.

View solution in original post

6 REPLIES 6

avatar
Master Guru
@Mike Frampton

Value of below properties looks odd to me

  1. /etc/hadoop/conf/mapred-site.xml => yarn-tez
  2. /etc/hive/conf/mapred-site.xml => yarn-tez

It should be only 'yarn'

Can you please change it to yarn and restart required services via Ambari and see if it helps.

Rest of the configuration looks good to me.

avatar

I think that the yarn-tez value is set by default because tez is integrated into HDP 2.4 and the tez install guide at

https://tez.apache.org/install.html in step 7. states that this value shoudl be set to yarn-tez. Il try the change, you never know there might be a conflict.

avatar
Master Guru

@Mike Frampton - Sure. Please keep us posted.

avatar

I changed the entries in the files listed above from yarn-tez to yarn and bounced oozie, hive and yarn. No luck. Then I decided to remove yarn using the ambari rest interface. I didnt manage to do that but I had stopped oozie to do it. On restarting Oozie this problem was cleared and my Oozie workflows now execute. Strange ..

avatar
Master Guru

@Mike Frampton - If you restart Oozie first before Yarn then there is a chance that first restart still read old configs. Then you restarted Yarn and Hive which deployed updated hive-site.xml and yarn-site.xml/mapred-site.xml. Next Oozie restart got the updated config files. I could be wrong but this is one of the possible reason.

avatar
Master Guru

@Mike Frampton - Is this resolved? If yes then can you please accept the appropriate answer?