Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie distributes application jar twice making the spark job fail

Oozie distributes application jar twice making the spark job fail

Contributor

I'm using HDP2.6. where is installed oozie 4.2. and Spark2.

After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-s... for adding libs for Spark2 in 4.2. version of Oozie.

After I submit the job with this add-on:

<code>oozie.action.sharelib.for.spark=spark2

The error I'm getting is this:

<code>   2017-07-19 12:36:53,271  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
    2017-07-19 12:36:53,275  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
    java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:629)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13$anonfun$apply$8.apply(Client.scala:620)
        at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:620)
        at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$13.apply(Client.scala:619)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:745)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
        at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
        at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
        at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
        at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

I have read that new Spark2 will not work with Spark 2.1 (via oozie anyway) due to a change in how Spark handles multiple files found in distributed cache, as mentioned here: see here

Keep in mind that I'm using Ambari and HDP2.6. How can I deal with this?

@Tom Shields

4 REPLIES 4
Highlighted

Re: Oozie distributes application jar twice making the spark job fail

Hi tom
I have the same issue... Did you finally achieve to make oozie works with spark2 ?

Régis

Highlighted

Re: Oozie distributes application jar twice making the spark job fail

New Contributor

I'm having exactly the same issue you describe - have you managed to solve this yet?

Highlighted

Re: Oozie distributes application jar twice making the spark job fail

@James Porritt @easyoups @Ivan Majnaric The above error occurs because the same jar files exists in both oozie and spark2 directories inside the oozie share lib in hdfs.

Follow the next HC Article to resolve this problem

https://community.hortonworks.com/content/supportkb/150151/main-class-orgapacheoozieactionhadoopspar...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Highlighted

Re: Oozie distributes application jar twice making the spark job fail

Expert Contributor

https://issues.apache.org/jira/browse/OOZIE-2787 - This is the BUG id which you are hitting. To get rid of this error you have to ensure that duplicate jar file should not be present under oozie.libpath, oozie share lib and spark share lib directories.

Don't have an account?
Coming from Hortonworks? Activate your account here