Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDH 5.8.3 and 5.9 Spark Action on Oozie issues.

avatar

Hi All,

 

I've been developing some oozie workflows that include spark actions. As a smoke test I've been running the SparkPi application from spark examples jar to make sure everything is working as I would expect. Originally my testing was done on CDH 5.8.0 where this workflow was working fine. I've tested it against 5.8.3 and 5.9.0 as well and it does NOT work. I get the following error when yarn tries to execute the spark job:

 

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster


The workflow and its property file are configured and placed on HDFS as seen below.

placement:

# hadoop fs -ls -R /foo/oozie/oozieApps/pi
drwxr-xr-x   - foo foo          0 2016-11-15 21:16 /foo/oozie/oozieApps/pi/lib
-rw-r--r--   3 foo foo  107864471 2016-11-15 21:16 /foo/oozie/oozieApps/pi/lib/spark-assembly.jar
-rw-r--r--   3 foo foo        655 2016-11-15 21:13 /foo/oozie/oozieApps/pi/workflow.xml

NOTE: The spark-assembly.jar is a copy of /opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly-1.6.0-cdh5.8.3-hadoop2.6.0-cdh5.8.3.jar

pi.properties:

master=yarn-master
mode=cluster
user.name=foo
nameNode=hdfs://example.com:8020
jobTracker=example.com:8050
queueName=default
oozie.wf.application.path=${nameNode}/foo/oozie/oozieApps/pi
sparkExampleJar=${nameNode}/foo/tmp/spark-examples.jar
argN=10
oozie.use.system.libpath=true


workflow.xml

<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkPi'>
	<start to='spark-node' />
	<action name='spark-node'>
		<spark xmlns="uri:oozie:spark-action:0.1">
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<master>${master}</master>
			<mode>${mode}</mode>
			<name>Spark-pi</name>
			<class>org.apache.spark.examples.SparkPi</class>
			<jar>${sparkExampleJar}</jar>
			<arg>${argN}</arg>
		</spark>
		<ok to="end" />
		<error to="fail" />
	</action>
	<kill name="fail">
		<message>Workflow failed, error
			message[\${wf:errorMessage(wf:lastErrorNode())}]
		</message>
	</kill>
	<end name='end' />
</workflow-app>


I'm guessing that this is some sort of classpath issue because I've checked the content of the spark-assembly.jar and the class that it says it cannot find is indeed included. Any ideas on how to fix this issue or troubleshoot it further?

1 ACCEPTED SOLUTION

avatar

So I found a solution for getting it to work on 5.8.3 and 5.9.

I added the following configuration to the workflow:

<spark-opts>--conf spark.yarn.jar=local:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly.jar</spark-opts>


I don't know why that is necessary on 5.8.3 and 5.9 but not on 5.8.0.

View solution in original post

2 REPLIES 2

avatar

Can anyone confirm/deny that they are able to reproduce the problem on their 5.8.3 or 5.9.0 environments?

avatar

So I found a solution for getting it to work on 5.8.3 and 5.9.

I added the following configuration to the workflow:

<spark-opts>--conf spark.yarn.jar=local:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly.jar</spark-opts>


I don't know why that is necessary on 5.8.3 and 5.9 but not on 5.8.0.