Reply
Explorer
Posts: 14
Registered: ‎02-21-2014

Spark Yarn example in Oozie 4.1 CDH 5.4.0

While trying to submit the example Spark job through Oozie 4.1. CDH 5.4.0 in yarn-client mode, I get the below error (stack trace included).  I think this class WebAppUtils is found in the jar hadoop-yarn-common and this method was added in Hadoop 2.4.0. It wasn't there in Hadoop 2.3.0 version. You can check here.

 
I thought the older version hadoop 2.3.0 of the class is somewhere in the classpath. 
 
I searched for the jar to find the version using this command
sudo find . -name "hadoop-yarn-co*"
 
I found only 2.6.0 version jars. 
 
I want to mention, I am using Spark 1.3.1 built using Hadoop 2.6.0 version. 
 
2015-05-05 16:43:38,406 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.webapp.util.WebAppUtils.getProxyHostsAndPortsForAmFilter(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/List;
	at org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer.initFilter(AmFilterInitializer.java:40)
	at org.apache.hadoop.http.HttpServer.<init>(HttpServer.java:272)
	at org.apache.hadoop.yarn.webapp.WebApps$Builder$2.<init>(WebApps.java:222)
	at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:219)
	at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.serviceStart(MRClientService.java:136)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1058)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1445)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1441)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1374)
 
I have added the following settings
 
1. In oozie-site.xml, added the property,                <name>oozie.service.SparkConfigurationService.spark.configurations</name>
        <value>*=/opt/spark-1.3.1-bin-hadoop2.6/conf</value>
 
2. In spark-defaults.xml added the property spark.master
3. Also have YARN_CONF directory set in the system variable.
 
I also submitted Spark Pi job outside Oozie in yarn-client mode and it was successful. 
 
Thanks,
 
Raj
Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Have you tried to use the Spark version shipped in 5.4.0 instead of a
custom built one (which likely was not built against CDH 5.4.0 hadoop
versions)?

Explorer
Posts: 14
Registered: ‎02-21-2014

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

I will give a try of Spark 1.3.0 CDH 5.4.0 tarball from 

 

http://archive.cloudera.com/cdh5/cdh/5/spark-1.3.0-cdh5.4.0.tar.gz

 

Thanks Harsh for your suggestion. 

Explorer
Posts: 14
Registered: ‎02-21-2014

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Changing Spark to CDH distribution didn't help, but threw a good pointer. Recompiled and created Spark example jar with CDH 5.4.0

hadoop 2.6.0 jars dependencies (believe it was compiled using Hadoop 2.3.0) and that solved the problem.

 

Another unrelated issue.  If we have an explicit ctx.stop(), we get this error. Is Oozie Spark action handling it for us? After we took off that line ctx.stop(), it is completing normal.  Thanks Harsh for your timely reply! 

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Job cancelled because SparkContext was shut down
org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:699)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:698)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:698)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1411)
	at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81)
	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1346)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:1380)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:143)

  

Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

Glad to hear. What issue did you get when you tried against CDH Spark?
Are you using an entirely tarball based installation?

The action uses SparkSubmit.main(…) from Spark to invoke the
arguments/program - I believe that should be handling stoppage at end
of execution for you.

Explorer
Posts: 14
Registered: ‎02-21-2014

Re: Spark Yarn example in Oozie 4.1 CDH 5.4.0

When I used Spark tarball from CDH 5.4.0, I got exactly the same NoSuchMethodError. Having gone through this Spark configuration, I continued with the same setup and modified hadoop-client SBT to 2.6.0-cdh5.4.0 in my client code project, recompiled and used the new jar. 

Announcements