Created 07-18-2017 08:30 AM
I'm trying to make spark action in oozie and I uploaded my .jar on hdfs. While I'm trying to submit the job on oozie I'm getting this error:
2017-07-18 10:24:32,726 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] Start action [0000007-170717153234639-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-07-18 10:24:32,728 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] [***0000007-170717153234639-oozie-oozi-W@:start:***]Action status=DONE 2017-07-18 10:24:32,728 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] [***0000007-170717153234639-oozie-oozi-W@:start:***]Action updated in DB! 2017-07-18 10:24:32,778 INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@:start:] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W@:start: 2017-07-18 10:24:32,779 INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W 2017-07-18 10:24:32,798 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Start action [0000007-170717153234639-oozie-oozi-W@spark_1] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2017-07-18 10:24:35,045 INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Trying to get job [job_1500298352706_0009], attempt [1] 2017-07-18 10:24:35,074 INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] checking action, hadoop job ID [job_1500298352706_0009] status [RUNNING] 2017-07-18 10:24:35,076 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] [***0000007-170717153234639-oozie-oozi-W@spark_1***]Action status=RUNNING 2017-07-18 10:24:35,076 INFO ActionStartXCommand:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] [***0000007-170717153234639-oozie-oozi-W@spark_1***]Action updated in DB! 2017-07-18 10:24:35,084 INFO WorkflowNotificationXCommand:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] No Notification URL is defined. Therefore nothing to notify for job 0000007-170717153234639-oozie-oozi-W@spark_1 2017-07-18 10:25:25,159 INFO CallbackServlet:520 - SERVER[bigdata3.int.ch] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] callback for action [0000007-170717153234639-oozie-oozi-W@spark_1] 2017-07-18 10:25:25,201 INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Trying to get job [job_1500298352706_0009], attempt [1] 2017-07-18 10:25:25,289 INFO SparkActionExecutor:520 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] action completed, external ID [job_1500298352706_0009] 2017-07-18 10:25:25,425 WARN SparkActionExecutor:523 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist 2017-07-18 10:25:25,427 WARN SparkActionExecutor:523 - SERVER[bigdata3.int.ch] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000007-170717153234639-oozie-oozi-W] ACTION[0000007-170717153234639-oozie-oozi-W@spark_1] Launcher exception: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist java.io.FileNotFoundException: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:371) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:487) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$12.apply(Client.scala:598) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$12.apply(Client.scala:597) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:597) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Does someone know how to fix it or some workaround because I have read that spark-submit cannot read jars which are on hdfs.
This is my preview of workflow:
<workflow-app name="Workflow2" xmlns="uri:oozie:workflow:0.5"> <start to="spark_1"/> <action name="spark_1"> <spark xmlns="uri:oozie:spark-action:0.2"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn-cluster</master> <name>ScalaPi</name> <class>org.apache.spark.examples.ScalaPi</class> <jar>/spark-examples_2.11-2.1.0.2.6.0.3-8.jar</jar> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>${wf:errorMessage(wf:lastErrorNode())}</message> </kill> <end name="end"/> </workflow-app>
UPDATE 1:
I'm using HDP2.6. where is installed oozie 4.2. and Spark2
I also tried to add full path on hdfs like
hdfs://xxxx.xxx:8020/spark-examples_2.11-2.1.0.2.6.0.3-8.jar where I afterwards got this:
2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. 2017-07-19 12:36:53,275 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache. at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:629) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:620) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:620) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:619) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Is anyone familiar?
P.S. I erased URLs of hdfs so you don't get confused 🙂
Created 07-18-2017 05:05 PM
@Ivan Majnaric, It looks like your job is trying to find jar on local filesystem instead HDFS.
java.io.FileNotFoundException: File file:/spark-examples_2.11-2.1.0.2.6.0.3-8.jar does not exist
Please follow below article to set up Spark-Oozie Action workflow.
Created 07-19-2017 10:47 AM
I updated my question after your answer, can you please check it @yvora 🙂
Created 11-12-2018 01:18 PM
Hello Ivan,
I have the same problem with HDP 2.6.4. Is it solved for you ?
Regards,
Chris