Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Yarn example in Oozie 4.1 CDH 5.4.0

Spark Yarn example in Oozie 4.1 CDH 5.4.0

Explorer

While trying to submit the example Spark job through Oozie 4.1. CDH 5.4.0 in yarn-client mode, I get the below error (stack trace included).  I think this class WebAppUtils is found in the jar hadoop-yarn-common and this method was added in Hadoop 2.4.0. It wasn't there in Hadoop 2.3.0 version. You can check here.

 
I thought the older version hadoop 2.3.0 of the class is somewhere in the classpath. 
 
I searched for the jar to find the version using this command
sudo find . -name "hadoop-yarn-co*"
 
I found only 2.6.0 version jars. 
 
I want to mention, I am using Spark 1.3.1 built using Hadoop 2.6.0 version. 
 
2015-05-05 16:43:38,406 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.webapp.util.WebAppUtils.getProxyHostsAndPortsForAmFilter(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/List;
	at org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer.initFilter(AmFilterInitializer.java:40)
	at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:272)
	at org.apache.hadoop.yarn.webapp.WebApps$Builder$2.<init>(WebApps.java:222)
	at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:219)
	at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:136)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1058)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1445)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1441)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1374)
 
I have added the following settings
 
1. In oozie-site.xml, added the property,                <name>oozie.service.SparkConfigurationService.spark.configurations</name>
        <value>*=/opt/spark-1.3.1-bin-hadoop2.6/conf</value>
 
2. In spark-defaults.xml added the property spark.master
3. Also have YARN_CONF directory set in the system variable.
 
I also submitted Spark Pi job outside Oozie in yarn-client mode and it was successful. 
 
Thanks,
 
Raj
5 REPLIES 5

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Master Guru
Have you tried to use the Spark version shipped in 5.4.0 instead of a
custom built one (which likely was not built against CDH 5.4.0 hadoop
versions)?

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Explorer

I will give a try of Spark 1.3.0 CDH 5.4.0 tarball from 

 

http://archive.cloudera.com/cdh5/cdh/5/spark-1.3.0-cdh5.4.0.tar.gz

 

Thanks Harsh for your suggestion. 

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Explorer

Changing Spark to CDH distribution didn't help, but threw a good pointer. Recompiled and created Spark example jar with CDH 5.4.0

hadoop 2.6.0 jars dependencies (believe it was compiled using Hadoop 2.3.0) and that solved the problem.

 

Another unrelated issue.  If we have an explicit ctx.stop(), we get this error. Is Oozie Spark action handling it for us? After we took off that line ctx.stop(), it is completing normal.  Thanks Harsh for your timely reply! 

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Job cancelled because SparkContext was shut down
org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:699)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:698)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:698)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1411)
	at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1346)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1380)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:143)

  

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Master Guru
Glad to hear. What issue did you get when you tried against CDH Spark?
Are you using an entirely tarball based installation?

The action uses SparkSubmit.main(…) from Spark to invoke the
arguments/program - I believe that should be handling stoppage at end
of execution for you.

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Explorer

When I used Spark tarball from CDH 5.4.0, I got exactly the same NoSuchMethodError. Having gone through this Spark configuration, I continued with the same setup and modified hadoop-client SBT to 2.6.0-cdh5.4.0 in my client code project, recompiled and used the new jar.