Reply
New Contributor
Posts: 1
Registered: ‎01-14-2016

Oozie workflow cannot run Spark action on YARN

I am using CDH 5.5.1 (Oozie 4.1 and Spark 1.5.1), and I'd like to run a Spark job on YARN and submit it through Oozie workflow.

 

The Spark app is a Pi program contained in the example jar provided by CDH (spark-examples-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar) and the main class is "org.apache.spark.examples.SparkPi".

 

My oozie workflow definition's directory is as follows:

- spark/
   - job.properties
   - workflow.xml
   - lib/
      - spark-examples-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar

 

 

job.peoperties is as following:

 

NameNode=hdfs://nameservice1
RM=ecs2.njzd.com:8032
Master=yarn-client

oozie.use.system.libpath=true
oozie.wf.application.path=${NameNode}/user/tao/oozie/examples/apps/xt/spark

workflow.xml is as following:

<workflow-app name="WF-Spark" xmlns="uri:oozie:workflow:0.5">
  <start to='spark-node'/>

  <action name='spark-node'>
    <spark xmlns="uri:oozie:spark-action:0.1">
      <job-tracker>${RM}</job-tracker>
      <name-node>${NameNode}</name-node>
      <master>${Master}</master>
      <name>Spark-Job-Pi</name>
      <class>org.apache.spark.examples.SparkPi</class>
      <jar>${NameNode}/user/tao/oozie/examples/apps/xt/spark/lib/spark-examples-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar</jar>
    </spark>
    <ok to="end-node"/>
    <error to="fail-node"/>
  </action>

  <kill name="fail-node">
    <message>Spark Job Failed!</message>
  </kill>
  <end name="end-node"/>
</workflow-app>

 

I uploaded the workflow definition directory into HDFS and submitted the workflow job, the launcher succeeded but the Spark action failed with the following error:

Application application_1451571118974_0127 failed 2 times due to AM Container for appattempt_1451571118974_0127_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ecs2.njzd.com:8088/proxy/application_1451571118974_0127/Then, click on links to logs of each attempt.
Diagnostics: Resource hdfs://ecs3.njzd.com:8020/user/root/.sparkStaging/application_1451571118974_0127/spark-yarn_2.10-1.5.0-cdh5.5.1.jar changed on src filesystem (expected 1452757035811, was 1452757065894
java.io.IOException: Resource hdfs://ecs3.njzd.com:8020/user/root/.sparkStaging/application_1451571118974_0127/spark-yarn_2.10-1.5.0-cdh5.5.1.jar changed on src filesystem (expected 1452757035811, was 1452757065894
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.

 

The error log mentioned "Resource hdfs://..../spark-yarn_2.10-1.5.0-cdh5.5.1.jar changed on src filesystem (expected 1452757035811, was 1452757065894" and I can not understand the cause.


I'd like to know the reason and how to resolve it.

 

Thanks.

 

New Contributor
Posts: 5
Registered: ‎03-17-2016

Re: Oozie workflow cannot run Spark action on YARN

I just ran into the same issue. Spark+OOZIE on CDH5.5 is riddled with bugs. Let me know if you find a solution.

Posts: 819
Kudos: 93
Solutions: 47
Registered: ‎04-06-2015

Re: Oozie workflow cannot run Spark action on YARN

@honzasterba Can you start a new thread to further explain the other issues you are having to conclude the combination is "riddled with bugs" so we don't muddy this thread. Also, are you receiving the exact same error as this thread or something similar?

Cy Jervis, Community Manager


Was your question answered? Make sure to mark it as an accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

New Contributor
Posts: 5
Registered: ‎03-17-2016

Re: Oozie workflow cannot run Spark action on YARN

Hello,

 

I am seeing exactly this issue, although my environment is a little non-standard since I am not using oozies sharelib/spark but spark bundled with my application.

 

Eventually I have been able to work around this by setting spark.yarn.jar to my application uber-jar.

 

I have created at least one topic with my issues, will create more if you promise somebody will look into those. 

 

First one: https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Cloudera-5-5-2-Oozie-Spark-job-secur...

 

Second One: https://community.cloudera.com/t5/Batch-Processing-and-Workflow/Enable-to-run-Spark-from-OOZIE/m-p/3...

Posts: 819
Kudos: 93
Solutions: 47
Registered: ‎04-06-2015

Re: Oozie workflow cannot run Spark action on YARN

Thanks for the additional information. I did notice your other post on the topic which is why I asked. As for replies to posts, this is mainly a peer to peer community but there are instances where Clouderans reply to posts. 

Cy Jervis, Community Manager


Was your question answered? Make sure to mark it as an accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

New Contributor
Posts: 5
Registered: ‎03-17-2016

Re: Oozie workflow cannot run Spark action on YARN

I would think that the fact that oozie-spark integration is utterly broken in 5.5.2 should be of intrest to many Clouderans...

Highlighted
New Contributor
Posts: 1
Registered: ‎04-01-2016

Re: Oozie workflow cannot run Spark action on YARN

Hi guys.

Try this. Wrap spark action invocation into sub-workflow and place link to in on main workflow.
Somethink like that:

- spark/
  - job.properties
  - workflow.xml
  - lib/
    - spark-examples-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar
  - subflow/
    - workflow.xml

 

workflow.xml (main) is as following

<workflow-app name="WF-Spark" xmlns="uri:oozie:workflow:0.5">
    <start to='spark-node'/>
    
    <action name='spark-node'>
        <sub-workflow>
            <app-path>${wf:appPath()}/subflow</app-path>
            <propagate-configuration/>
            <configuration>
                <property>
                    <name>mainAppPath</name>
                    <value>${wf:appPath()}</value>
                </property>
            </configuration>
        </sub-workflow>
        
        <ok to="end-node"/>
       <error to="fail-node"/>
    </action>

    <kill name="fail-node">
        <message>Spark Job Failed!</message>
    </kill>

    <end name="end-node"/>
</workflow-app>

subflow/workflow.xml (subflow) is as following

<workflow-app name="SUB-WF-Spark" xmlns="uri:oozie:workflow:0.5">
    <start to='spark-node'/>
    
    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${RM}</job-tracker>
            <name-node>${NameNode}</name-node>
            <master>${Master}</master>
            <name>Spark-Job-Pi</name>
            <class>org.apache.spark.examples.SparkPi</class>
            <jar>>${mainAppPath}/lib/spark-examples-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar</jar>
        </spark>
        
        <ok to="end-node"/>
        <error to="fail-node"/>
    </action>

    <kill name="fail-node">
        <message>Spark Job Failed!</message>
    </kill>
    
    <end name="end-node"/>
</workflow-app>

 

It works for me for now.

New Contributor
Posts: 1
Registered: ‎09-13-2016

Re: Oozie workflow cannot run Spark action on YARN

Without Wraper, works for me:

 

<workflow-app name="WF-Spark" xmlns="uri:oozie:workflow:0.5">
    <start to='spark-node'/>
    
    <action name='spark-node'>
    
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <master>${master}</master>
            <name>Spark-Job-Pi</name>
            <class>org.apache.spark.examples.SparkPi</class>
            <jar>${wf:appPath()}/lib/spark-examples-1.5.0-cdh5.5.4-hadoop2.6.0-cdh5.5.4.jar</jar>
        </spark>
        
        <ok to="end-node"/>
       <error to="fail-node"/>
    </action>

    <kill name="fail-node">
        <message>Spark Job Failed!</message>
    </kill>

    <end name="end-node"/>
</workflow-app>
Announcements