Created 10-19-2019 01:52 AM
Hello.
So I started to develop my own coordinator and the thing is that after writing oozie workflow and submit on oozie 4.2.0 server with spark action, the yarn on my server keeps saying it is MapReduce job and fails. Below are configurations on workflow.xml file.
<workflow-app name="ch08_spark_max_rainfall"
xmlns="uri:oozie:workflow:0.5">
<start to="max_rainfall"/>
<action name="max_rainfall">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>"Spark Ch08 Max Rain Calculator"</name>
<class>life.jugnu.learnoozie.ch08.MaxRainfall</class>
<jar>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/rainbow/target/scala-2.11/rainbow-assembly-1.0.19.jar</jar>
<spark-opts>
--conf spark.yarn.historyServer.address=http://sandbox-hdp.hortonworks.com:18088
--conf spark.eventLog.dir=hdfs://sandbox-hdp.hortonworks.com:8020/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
<arg>${input}</arg>
<arg>${output}</arg>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
</workflow-app>
Here is a screenshot taken from Oozie ui and Yarn Ui.
Lastly, here is the specific error message that makes my program fail.
2019-10-19 08:38:18,270 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1571462262129_0020_000002 2019-10-19 08:38:18,424 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-10-19 08:38:18,445 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.NoSuchMethodError: org.apache.hadoop.security.SecurityUtil.setConfiguration(Lorg/apache/hadoop/conf/Configuration;)V at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1572) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526) 2019-10-19 08:38:18,447 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
At first, I thought it could be because there are duplicate hadoop-mapreduce jar files in the server and the yarn gets confused and emits the error message. So I compiled my scala source with sbt-assembly but it doesn't work at all. I also take my scala program under consideration but after spark-submitting my program on the remote docker container, it runs flawlessly and outputted the result on my hdfs and the yarn properly shows the type of the program, "SPARK".
What would be the cause of this problem? Please help.
Created 10-19-2019 06:07 AM
After carefully examining what is different between successful workflow and failed coordinator, I figured out that oozie keeps adding <configuration> field on action configuration.
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>sandbox-hdp.hortonworks.com:8032</job-tracker>
<name-node>hdfs://sandbox-hdp.hortonworks.com:8020</name-node>
<master>yarn</master>
<mode>cluster</mode>
<name>Spark Ch08 Max Rain Calculator</name>
<jar>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/spark_rainfall/lib/MaxRainfall.py</jar>
<spark-opts>--conf spark.yarn.historyServer.address=http://sandbox-hdp.hortonworks.com:18088
--conf spark.eventLog.dir=hdfs://sandbox-hdp.hortonworks.com:8020/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
<arg>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch05/input/rainfall/2015/01/</arg>
<arg>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/processed/max_rainfall/2015/01/</arg>
<configuration>
<property xmlns="">
<name>oozie.use.system.libpath</name>
<value>True</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>nameNode</name>
<value>hdfs://sandbox-hdp.hortonworks.com:8020</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>jobTracker</name>
<value>sandbox-hdp.hortonworks.com:8032</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>oozie.libpath</name>
<value>hdfs://sandbox-hdp.hortonworks.com:8020/user/oozie/share/lib</value>
<source>programatically</source>
</property>
</configuration>
</spark>
I think <configuration> field is not necessary for running the spark app and makes spark action crash.
Created 10-23-2019 04:31 PM