Created 04-28-2016 11:42 PM
Hello everyone,
Is there anyone tried to use Oozie Spark action on HDP 2.4. I believe it's now supported and I'm trying to schedule a job through Oozie using Spark action but facing several issues.
I have already followed Hortonworks KB article related to "How-To_ How to run Spark Action in oozie of HDP" and followed each steps. I'm also able to submit my jar through spark-submit using "yarn-cluster" mode but when I place it through Oozie I face the following error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.DEFAULT_DRIVER_MEM_MB()I at org.apache.spark.deploy.yarn.ClientArguments.<init>(ClientArguments.scala:49) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1120) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I guess some library mismatch in the classpath but not sure which one. My Oozie workflow is pretty simple, just reading some files from S3 and print the count:
<workflow-app name="POC-ETL" xmlns="uri:oozie:workflow:0.5"> <start to="spark"/> <action name="spark"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> </configuration> <master>${master}</master> <!-- <mode>cluster</mode> --> <name>Sample</name> <class>com.example.PrintCount</class> <jar>${nameNode}/user/${wf:user()}/poc/workflow/lib/print-count.jar</jar> <spark-opts>spark.driver.extraJavaOptions=-Dhdp.version=2.4.0.0-169</spark-opts> <arg>s3n://....</arg> </spark> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>ETL Ingest Failed: error [${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
Here is my job.properties file:
oozie.wf.application.path=${nameNode}/user/cidev/poc/workflow oozie.use.system.libpath=true oozie.action.sharelib.for.spark = spark,hcatalog,hive nameNode=hdfs://hdpmaster.cidev:8020 jobTracker=hdp1.cidev:8050 queueName=default master=yarn-cluster
Here is the list from sharelib Spark:
hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-actor_2.10-2.2.3-shaded-protobuf.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-remote_2.10-2.2.3-shaded-protobuf.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/akka-slf4j_2.10-2.2.3-shaded-protobuf.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/aws-java-sdk-1.7.4.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/azure-storage-2.2.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill-java-0.3.6.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/chill_2.10-0.3.6.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/colt-1.2.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-codec-1.4.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-httpclient-3.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang-2.4.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-lang3-3.3.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/commons-net-2.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/compress-lzf-1.0.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/concurrent-1.3.4.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/config-1.0.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/core-site.xml hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-client-2.5.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-framework-2.5.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/curator-recipes-2.5.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/guava-14.0.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-aws-2.7.1.2.4.0.0-169.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hadoop-azure-2.7.1.2.4.0.0-169.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/hdfs-site.xml hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-annotations-2.2.3.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-core-2.2.3.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jackson-databind-2.2.3.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.activation-1.1.0.v201105071233.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.mail.glassfish-1.4.1.v201005082020.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.servlet-3.0.0.v201112011016.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/javax.transaction-1.1.1.v201105210645.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jblas-1.2.3.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jets3t-0.7.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-continuation-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-http-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-io-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-jndi-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-plus-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-security-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-server-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-servlet-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-util-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-webapp-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jetty-xml-8.1.14.v20131031.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jline-2.12.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/joda-time-2.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-ast_2.10-3.2.10.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-core_2.10-3.2.10.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/json4s-jackson_2.10-3.2.10.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/jsr305-1.3.9.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/kryo-2.21.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/log4j-1.2.16.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/lz4-1.2.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/mapred-site.xml hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-core-3.0.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-graphite-3.0.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-json-3.0.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/metrics-jvm-3.0.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/minlog-1.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-3.6.6.Final.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/netty-all-4.0.23.Final.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/objenesis-1.2.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/paranamer-2.6.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/protobuf-java-2.4.1-shaded.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/py4j-0.8.2.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/pyrolite-2.0.1.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/reflectasm-1.07-shaded.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-compiler-2.10.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-library-2.10.4.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scala-reflect-2.10.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/scalap-2.10.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-api-1.6.6.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/slf4j-log4j12-1.6.6.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/snappy-java-1.0.5.3.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-core_2.10-1.1.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-graphx_2.10-1.1.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/spark-streaming_2.10-1.1.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/stream-2.7.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-0.5.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/tachyon-client-0.5.0.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/uncommons-maths-1.2.2a.jar hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/yarn-site.xml hdfs://hdpmaster.cidev:8020/user/oozie/share/lib/spark/zookeeper-3.4.6.jar
Any helps would be highly appreciated.
Thanks in advance.
Tishu
Created 05-03-2016 05:03 PM
Hi @Chokroma Tusira, try removing all jars from your Oozie share/lib Spark directory in HDFS except those listed here, actually in case of HDP-2.4 you can start with just these two:
oozie-sharelib-spark-4.2.0.2.4.0.0-169.jar spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
As <spark-opts> I put the number of executors and their memory, it worked without hdp.version. Then restart Oozie, and retry your Spark action.
Created 05-05-2016 07:50 AM
Oozie Spark action is coming in HDP 2.4.2, it is not supported in versions of HDP below that.
Created 05-06-2016 03:43 PM
Keep it clear and concise. Other users will only be able t
Created 07-21-2016 12:10 PM
Hi everyone,
i am facing similar issue while running spark action with oozie.
Have placed only above two mentioned jars and remove all the other jars from /user/share/lib/spark/* folder.
but still getting the same error.
Created 07-21-2016 12:15 PM
I am using hdp 2.4 version with spark 1.6.0 version.
Created 07-21-2016 04:23 PM
Hi Kanwar,
Did you place *-site.xml files in Oozie share lib? You need to place *-site.xml files in Oozie share lib and then update the share lib and restart Oozie. Hope it helps.
Created 07-22-2016 01:30 PM
Hi Chokroma,
I have placed oozie-site.xml,hive-site.xml,core-site.xml and tez-site.xml in oozie share lib spark folder in hdfs.
That error got resolved and now while running spark action ,am getting below error:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/hadoop/yarn/local/usercache/root/appcache/application_1469191898202_0003/container_108_1469191898202_0003_01_000002/–conf does not exist java.io.FileNotFoundException: File file:/hadoop/yarn/local/usercache/root/appcache/application_1469191898202_0003/container_e108_1469191898202_0003_01_000002/–conf does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:125) at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467) at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193) at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2189) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2189) at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:601) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:327) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$distribute$1(Client.scala:407) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$5.apply(Client.scala:446) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$5.apply(Client.scala:444) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:444) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:722) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1065) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1125) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:104) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:95) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:38) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Created 07-22-2016 02:35 PM
it is showing error:
2016-07-22 14:28:55,660 INFO [main] org.apache.spark.deploy.yarn.Client: Source and destination file systems are the same. Not copying file:/hadoop/yarn/local/usercache/root/appcache/application_1469194745655_0011/container_e109_1469194745655_0011_01_000002/–conf