Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to get the spark support with 4.0.0-cdh5.3.2 oozie

avatar
Expert Contributor

HI

    As you know, there is not supporting spark with  4.0.0-cdh5.3.2 oozie in cdh5.3.2.  

    But, we would like to get the function of workflow support. 

    How to resolve the issue in our cdh5.3.2 environment?

   

Thanks

Paul

1 ACCEPTED SOLUTION

avatar
Super Collaborator

To rule out that we have a custom jar issue can you run the pi example to make sure that the cluster is (not) setup correctly?

We have documented how to run a spark application, with the example in our docs.

 

The error that you show points to a classpath error and you can not find the Spark classes on your class path.

 

WIlfred

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

The only way to use Spark when you do not have a Spark action is to use the shell based action and create the proper spark-submit command for it.

You will need to make sure that the configuration and classpath etc is set from the action.

 

Wilfred

avatar
Expert Contributor
HI Wilfred, I will to try follow your suggestion Thanks Paul

avatar
Expert Contributor
HI Wilfred
Could you give me an example?
Thanks in advance.
Paul

avatar
Super Collaborator

Whatever you use as a spark-submit from the command line is what you use in the oozie shell action.

Make sure that you have the proper gateway for Spark and YARN installed on the oozie server so it has the configuration needed.

 

The rest works as if you have a standard oozie shell action (i.e. create the workflow, properties and shell script files) and place the files on the machine/hdfs so they can be found.

 

Wilfred

avatar
Expert Contributor
Hi Wilfred
I have installed Spark Gateway, and yarn was already be installed with oozie. Unfortunately, I run the shell:
spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master yarn target/sparkwordcount-0.0.1-SNAPSHOT.jar /user/paul 2
got the error:
Exception in thread "Driver" scala.MatchError: java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf(of class java.lang.NoClassDefFoundError)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:432)
How to resolve the issue?
Thanks in advance.
Paul

avatar
Super Collaborator

To rule out that we have a custom jar issue can you run the pi example to make sure that the cluster is (not) setup correctly?

We have documented how to run a spark application, with the example in our docs.

 

The error that you show points to a classpath error and you can not find the Spark classes on your class path.

 

WIlfred