Support Questions

Find answers, ask questions, and share your expertise

Oozie distributes application jar twice making the spark job fail

Contributor

I'm using HDP2.6. where is installed oozie 4.2. and Spark2.

After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-s... for adding libs for Spark2 in 4.2. version of Oozie.

After I submit the job with this add-on:

<code>oozie.action.sharelib.for.spark=spark2

The error I'm getting is this:

<code>   2017-07-19 12:36:53,271  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
    2017-07-19 12:36:53,275  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
    java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:629)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:620)
        at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:620)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:619)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:745)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

I have read that new Spark2 will not work with Spark 2.1 (via oozie anyway) due to a change in how Spark handles multiple files found in distributed cache, as mentioned here: see here

Keep in mind that I'm using Ambari and HDP2.6. How can I deal with this?

@Tom Shields

4 REPLIES 4

Hi tom
I have the same issue... Did you finally achieve to make oozie works with spark2 ?

Régis

New Contributor

I'm having exactly the same issue you describe - have you managed to solve this yet?

@James Porritt @easyoups @Ivan Majnaric The above error occurs because the same jar files exists in both oozie and spark2 directories inside the oozie share lib in hdfs.

Follow the next HC Article to resolve this problem

https://community.hortonworks.com/content/supportkb/150151/main-class-orgapacheoozieactionhadoopspar...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Expert Contributor

https://issues.apache.org/jira/browse/OOZIE-2787 - This is the BUG id which you are hitting. To get rid of this error you have to ensure that duplicate jar file should not be present under oozie.libpath, oozie share lib and spark share lib directories.