Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Oozie-Spark akction cannot find file on hdfs

Contributor

I'm trying to make spark action in oozie and I uploaded my .jar on hdfs. While I'm trying to submit the job on oozie I'm getting this error:

2017-07-18 10:24:32,726  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] Start action [0000007-170717153234639-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-07-18 10:24:32,728  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] [***0000007-170717153234639-oozie-oozi-W@:start:***]Action status=DONE
2017-07-18 10:24:32,728  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] [***0000007-170717153234639-oozie-oozi-W@:start:***]Action updated in DB!
2017-07-18 10:24:32,778  INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W@:start:
2017-07-18 10:24:32,779  INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W
2017-07-18 10:24:32,798  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Start action [0000007-170717153234639-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2017-07-18 10:24:35,045  INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Trying to get job [job_1500298352706_0009], attempt [1]
2017-07-18 10:24:35,074  INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1500298352706_0009] status [RUNNING]
2017-07-18 10:24:35,076  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] [***0000007-170717153234639-oozie-oozi-W@spark_1***]Action status=RUNNING
2017-07-18 10:24:35,076  INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] [***0000007-170717153234639-oozie-oozi-W@spark_1***]Action updated in DB!
2017-07-18 10:24:35,084  INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W@spark_1
2017-07-18 10:25:25,159  INFO CallbackServlet:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] callback for action [0000007-170717153234639-oozie-oozi-W@spark_1]
2017-07-18 10:25:25,201  INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Trying to get job [job_1500298352706_0009], attempt [1]
2017-07-18 10:25:25,289  INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] action completed, external ID [job_1500298352706_0009]
2017-07-18 10:25:25,425  WARN SparkActionExecutor:523 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist
2017-07-18 10:25:25,427  WARN SparkActionExecutor:523 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Launcher exception: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist
java.io.FileNotFoundException: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
	at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:371)
	at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:487)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$12.apply(Client.scala:598)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$12.apply(Client.scala:597)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:597)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

Does someone know how to fix it or some workaround because I have read that spark-submit cannot read jars which are on hdfs.

This is my preview of workflow:

<workflow-app name="Workflow2"
    xmlns="uri:oozie:workflow:0.5">
    <start to="spark_1"/>
    <action name="spark_1">
        <spark
            xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${resourceManager}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>yarn-cluster</master>
            <name>ScalaPi</name>
            <class>org.apache.spark.examples.ScalaPi</class>
            <jar>/spark-examples_2.11-2.1.0.2.6.0.3-8.jar</jar>
        </spark>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>${wf:errorMessage(wf:lastErrorNode())}</message>
    </kill>
    <end name="end"/>
</workflow-app>

UPDATE 1:

I'm using HDP2.6. where is installed oozie 4.2. and Spark2

I also tried to add full path on hdfs like

hdfs://xxxx.xxx:8020/spark-examples_2.11-2.1.0.2.6.0.3-8.jar where I afterwards got this:

2017-07-19 12:36:53,271  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
2017-07-19 12:36:53,275  WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:629)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:620)
	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:620)
	at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:619)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
	at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
	at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
	at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)

Is anyone familiar?

P.S. I erased URLs of hdfs so you don't get confused 🙂

3 REPLIES 3

Guru

@Ivan Majnaric, It looks like your job is trying to find jar on local filesystem instead HDFS.

java.io.FileNotFoundException: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist

Please follow below article to set up Spark-Oozie Action workflow.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-s...

Contributor

I updated my question after your answer, can you please check it @yvora 🙂

Hello Ivan,

I have the same problem with HDP 2.6.4. Is it solved for you ?

Regards,

Chris

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.