Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie Spark Action on HDP 2.4 - NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()

avatar
Explorer

Hello everyone,

Is there anyone tried to use Oozie Spark action on HDP 2.4. I believe it's now supported and I'm trying to schedule a job through Oozie using Spark action but facing several issues.

I have already followed Hortonworks KB article related to "How-To_ How to run Spark Action in oozie of HDP" and followed each steps. I'm also able to submit my jar through spark-submit using "yarn-cluster" mode but when I place it through Oozie I face the following error:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I
	at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

I guess some library mismatch in the classpath but not sure which one. My Oozie workflow is pretty simple, just reading some files from S3 and print the count:

<workflow-app name="POC-ETL" xmlns="uri:oozie:workflow:0.5">
        <start to="spark"/>
        <action name="spark">
                <spark xmlns="uri:oozie:spark-action:0.1">
                        <job-tracker>${jobTracker}</job-tracker>
                        <name-node>${nameNode}</name-node>
                        <configuration>
                                <property>
                                        <name>mapred.compress.map.output</name>
                                        <value>true</value>
                                </property>
                        </configuration>
                        <master>${master}</master>
                        <!-- <mode>cluster</mode> -->
                        <name>Sample</name>
                        <class>com.example.PrintCount</class>
                        <jar>${nameNode}/user/${wf:user()}/poc/workflow/lib/print-count.jar</jar>
                        <spark-opts>spark.driver.extraJavaOptions=-Dhdp.version=2.4.0.0-169</spark-opts>
                        <arg>s3n://....</arg>
                </spark>
                <ok to="end"/>
                <error to="fail"/>
        </action>
        <kill name="fail">
                <message>ETL Ingest Failed: error  [${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
</workflow-app>

Here is my job.properties file:

oozie.wf.application.path=${nameNode}/user/cidev/poc/workflow
oozie.use.system.libpath=true
oozie.action.sharelib.for.spark = spark,hcatalog,hive
nameNode=hdfs://hdpmaster.cidev:8020
jobTracker=hdp1.cidev:8050
queueName=default
master=yarn-cluster

Here is the list from sharelib Spark:

hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-actor_2.10-2.2.3-shaded-protobuf.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-remote_2.10-2.2.3-shaded-protobuf.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/aws-java-sdk-1.7.4.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/azure-storage-2.2.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill-java-0.3.6.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill_2.10-0.3.6.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/colt-1.2.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-codec-1.4.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-httpclient-3.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang-2.4.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang3-3.3.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-net-2.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/compress-lzf-1.0.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/concurrent-1.3.4.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/config-1.0.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/core-site.xml
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-client-2.5.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-framework-2.5.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-recipes-2.5.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/guava-14.0.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-aws-2.7.1.2.4.0.0-169.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-azure-2.7.1.2.4.0.0-169.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hdfs-site.xml
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-annotations-2.2.3.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-core-2.2.3.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-databind-2.2.3.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.activation-1.1.0.v201105071233.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.mail.glassfish-1.4.1.v201005082020.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.servlet-3.0.0.v201112011016.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.transaction-1.1.1.v201105210645.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jblas-1.2.3.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jets3t-0.7.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-continuation-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-http-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-io-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-jndi-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-plus-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-security-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-server-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-servlet-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-util-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-webapp-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-xml-8.1.14.v20131031.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jline-2.12.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/joda-time-2.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-ast_2.10-3.2.10.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-core_2.10-3.2.10.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-jackson_2.10-3.2.10.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jsr305-1.3.9.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/kryo-2.21.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/log4j-1.2.16.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/lz4-1.2.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/mapred-site.xml
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-core-3.0.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-graphite-3.0.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-json-3.0.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-jvm-3.0.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/minlog-1.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-3.6.6.Final.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-all-4.0.23.Final.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/objenesis-1.2.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/paranamer-2.6.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/protobuf-java-2.4.1-shaded.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/py4j-0.8.2.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/pyrolite-2.0.1.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/reflectasm-1.07-shaded.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-compiler-2.10.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-library-2.10.4.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-reflect-2.10.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scalap-2.10.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-api-1.6.6.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-log4j12-1.6.6.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/snappy-java-1.0.5.3.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-core_2.10-1.1.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-graphx_2.10-1.1.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-streaming_2.10-1.1.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/stream-2.7.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-0.5.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-client-0.5.0.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/uncommons-maths-1.2.2a.jar
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/yarn-site.xml
	hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/zookeeper-3.4.6.jar

Any helps would be highly appreciated.

Thanks in advance.

Tishu

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Chokroma Tusira, try removing all jars from your Oozie share/lib Spark directory in HDFS except those listed here, actually in case of HDP-2.4 you can start with just these two:

oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar

As <spark-opts> I put the number of executors and their memory, it worked without hdp.version. Then restart Oozie, and retry your Spark action.

View solution in original post

16 REPLIES 16

avatar
Master Guru

Hi Chokroma,

so the action is still not supported. However there is a tech note out to use it. Seems you used that?

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_HDP_RelNotes/content/community_features.h...

The method you mention was added in Spark 1.5 so it seems like you have an old jar file somewhere. It should be the spark assembly. The one in the share lib is 1.6 so that cannot be it. You must have an old version around somewhere.

https://issues.apache.org/jira/browse/SPARK-3071

avatar
Explorer

Thanks @Benjamin Leonhardi, appreciate it. I was under impression that Spark action is now supported under HDP 2.4. My bad 😞

I checked classpath but didn't find why some old jars are still there. In the current cluster, we initially installed HDP 2.3.4, then we upgraded to HDP 2.4. Even in HDP 2.3.4, Spark 1.5.2 was there so it should have found that method. So looks like it still pointing to old version of Spark somehow.

avatar
Master Guru

Hi @Chokroma Tusira, try removing all jars from your Oozie share/lib Spark directory in HDFS except those listed here, actually in case of HDP-2.4 you can start with just these two:

oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar
spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar

As <spark-opts> I put the number of executors and their memory, it worked without hdp.version. Then restart Oozie, and retry your Spark action.

avatar
Explorer

Thanks @Predrag Minovic, I got past that error but now it's getting timed out. Looks like when the job is placed, its pointing to invalid resource manager url. Not sure why it's not picking up the right configuration.

2016-05-03 17:04:09,439 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2016-05-03 17:04:10,448 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:11,449 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:12,450 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:13,450 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:14,451 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-05-03 17:04:15,452 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

By the way, should we ignore this warning?

 org.apache.spark.SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.

Thanks again.

avatar
Master Guru

Check your jobTracker setting in your job.properties, you had "jobTracker=hdp1.cidev:8050" in your question. And make sure you use "-conf job.properties" when you submit your job.

avatar
Explorer

Thanks again @Predrag Minovic, I tried that then I saw this:

https://community.hortonworks.com/questions/11599/how-to-change-resourcemanager-port-on-oozie.html

https://community.hortonworks.com/questions/21248/spark-action-always-submit-to-00008032.html

Looks like it's a known issue. As a work around, I changed my resource manager port to 8032 and restarted again but didn't help. It was still pointing to 0.0.0.0/0.0.0.0:8032. Though at this point, the port is right, but not the hostname, it should be hdp1.cidev/hdp1.cidev:8032. So looks like Oozie/Spark action is not picking up the right configuration. Any clue?

Thanks again.

avatar
Master Guru

Do you still have 4 *-site.xml files in your Oozie share/lib Spark directory in HDFS? If not upload them to HDFS, or even better, to be sure, remove them, and upload the latest versions. Then restart Oozie and retry your action.

avatar
Explorer

That did the trick! Thanks a lot @Predrag Minovic. Should I consider that as a work-around for now? I never had to do that for other Oozie actions. Thanks again.

avatar
Master Guru

Hi @Chokroma Tusir, yes, a work-around in this version of HDP. As the docs say "there is a community solution", well, we are the community! 🙂 And thanks for a rep point! Can you also accept the answer to help us manage answered questions. Tnx!