Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark coordinator job submitted but MapReduce job is shown on Yarn.

Spark coordinator job submitted but MapReduce job is shown on Yarn.

New Contributor

Hello.

 

So I started to develop my own coordinator and the thing is that after writing oozie workflow and submit on oozie 4.2.0 server with spark action, the yarn on my server keeps saying it is MapReduce job and fails. Below are configurations on workflow.xml file.

 

<workflow-app name="ch08_spark_max_rainfall"
xmlns="uri:oozie:workflow:0.5">

<start to="max_rainfall"/>

<action name="max_rainfall">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>"Spark Ch08 Max Rain Calculator"</name>
<class>life.jugnu.learnoozie.ch08.MaxRainfall</class>
<jar>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/rainbow/target/scala-2.11/rainbow-assembly-1.0.19.jar</jar>
<spark-opts>
--conf spark.yarn.historyServer.address=http://sandbox-hdp.hortonworks.com:18088
--conf spark.eventLog.dir=hdfs://sandbox-hdp.hortonworks.com:8020/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
<arg>${input}</arg>
<arg>${output}</arg>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>

<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>

<end name="End"/>
</workflow-app>

Here is a screenshot taken from Oozie ui and Yarn Ui.

스크린샷 2019-10-19 오후 5.44.25.png스크린샷 2019-10-19 오후 5.44.30.png Lastly, here is the specific error message that makes my program fail.

2019-10-19 08:38:18,270 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1571462262129_0020_000002
2019-10-19 08:38:18,424 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-10-19 08:38:18,445 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.security.SecurityUtil.setConfiguration(Lorg/apache/hadoop/conf/Configuration;)V
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1572)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526)
2019-10-19 08:38:18,447 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

At first, I thought it could be because there are duplicate hadoop-mapreduce jar files in the server and the yarn gets confused and emits the error message. So I compiled my scala source with sbt-assembly but it doesn't work at all. I also take my scala program under consideration but after spark-submitting my program on the remote docker container, it runs flawlessly and outputted the result on my hdfs and the yarn properly shows the type of the program, "SPARK". 

 

What would be the cause of this problem? Please help. 

2 REPLIES 2

Re: Spark coordinator job submitted but MapReduce job is shown on Yarn.

New Contributor

After carefully examining what is different between successful workflow and failed coordinator, I figured out that oozie keeps adding <configuration> field on action configuration. 

 

<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>sandbox-hdp.hortonworks.com:8032</job-tracker>
<name-node>hdfs://sandbox-hdp.hortonworks.com:8020</name-node>
<master>yarn</master>
<mode>cluster</mode>
<name>Spark Ch08 Max Rain Calculator</name>
<jar>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/spark_rainfall/lib/MaxRainfall.py</jar>
<spark-opts>--conf spark.yarn.historyServer.address=http://sandbox-hdp.hortonworks.com:18088
--conf spark.eventLog.dir=hdfs://sandbox-hdp.hortonworks.com:8020/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
<arg>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch05/input/rainfall/2015/01/</arg>
<arg>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/processed/max_rainfall/2015/01/</arg>
<configuration>
<property xmlns="">
<name>oozie.use.system.libpath</name>
<value>True</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>nameNode</name>
<value>hdfs://sandbox-hdp.hortonworks.com:8020</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>jobTracker</name>
<value>sandbox-hdp.hortonworks.com:8032</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>oozie.libpath</name>
<value>hdfs://sandbox-hdp.hortonworks.com:8020/user/oozie/share/lib</value>
<source>programatically</source>
</property>
</configuration>
</spark>

 

I think <configuration> field is not necessary for running the spark app and makes spark action crash.

Highlighted

Re: Spark coordinator job submitted but MapReduce job is shown on Yarn.

Guru
@windforces

The first thing I will check is if you can run your spark job outside of oozie? Can you submit your job using spark-submit? Just try to isolate the issue, and confirm it is Oozie issue or not.

Thanks
Eric
Don't have an account?
Coming from Hortonworks? Activate your account here