Community Articles
Find and share helpful community-sourced technical articles
Labels (2)

Because spark action in oozie is not supported in HDP 2.3.x and HDP 2.4.0, there is no workaround especially in kerberos environment. We can use either java action or shell action to launch spark job in oozie workflow. In this article, we will discuss how to use oozie shell action to run a spark job in kerberos environment.

Prerequisite:

1. Spark client is installed on every host where nodemanager is running. This is because we have no control over which node the

2. Optionally, if the spark job need to interact with hbase cluster, hbase client need to be installed on every host as well.

Steps:

1. Create a shell script with the spark-submit command. For example, in the script.sh:

/usr/hdp/current/spark-client/bin/spark-submit --keytab keytab --principal ambari-qa-falconJ@FALCONJSECURE.COM --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 500m --num-executors 1 --executor-memory 500m --executor-cores 1 spark-examples.jar 3

2. Prepare kerberos keytab which will be used by the spark job. For example, we use ambari smoke test user, the keytab is already generated by Ambari in/etc/security/keytabs/smokeuser.headless.keytab.

3. Create the oozie workflow with a shell action which will execute the script created above, for example, in the workflow.xml:

<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.4">
  <start to="shellAction"/>
    <action name="shellAction">
      <shell xmlns="uri:oozie:shell-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>script.sh</exec>
        <file>/user/oozie/shell/script.sh#script.sh</file>
        <file>/user/oozie/shell/smokeuser.headless.keytab#keytab</file>
        <file>/user/oozie/shell/spark-examples.jar#spark-examples.jar</file>
        <capture-output/>
      </shell>
      <ok to="end"/>
      <error to="killAction"/>
  </action>
  <kill name="killAction">
    <message>"Killed job due to error"</message>
  </kill>
  <end name="end"/>
</workflow-app>

4. Create the oozie job properties file. For example, in job.properties:

nameNode=hdfs://falconJ1.sec.support.com:8020

jobTracker=falconJ2.sec.support.com:8050

queueName=default

oozie.wf.application.path=${nameNode}/user/oozie/shell

oozie.use.system.libpath=true

5. Upload the following files created above to the oozie workflow application path in HDFS (In this example: /user/oozie/shell):

- workflow.xml

- smokeuser.headless.keytab

- script.sh

- spark uber jar (In this example: /usr/hdp/current/spark-client/lib/spark-examples*.jar)

- Any other configuration file mentioned in workflow (optional)

6. Execute the oozie command to run this workflow. For example:

oozie job -oozie http://<oozie-server>:11000/oozie -config job.properties -run

*This article was created by Hortonworks Support on 2016-04-28

11,962 Views
Comments
Not applicable

when I try this with spark2.0 I got error:

[AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1474966402164_0092_m_000000_0: Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143 

Container exited with a non-zero exit code 143

Explorer

do you have detailed log for this job? @kevin shen

Not applicable
log.txt
thanks
New Contributor

Hi Eyad,

I'm trying to execute a Spark2 action using the Shell Action in Oozie. I've tried the exact same steps as above but I'm stuck at the point below:

It just keeps on printing this forever in the stdout logs of the Oozie Launcher:

>>> Invoking Shell command line now >>

Stdoutput Testing Shell Action
Heart beat
Heart beat
Heart beat
Heart beat

There is no error also, please suggest what am I doing wrong?

workflow.txt

job-properties.txt

echo.txt

Contributor

Can you elaborate a bit on how to set up the environment properly in the shell wrapper before calling spark-submit? Which login to get the action to run as? (owner/yarn/spark/oozie)

We've had a lot of problems getting the setup right when we implemented shell actions that wrap Hive queries (to process query output). spark-submit itself is a shell wrapper that does a lot of environment initialization, so I imagine it won't be smooth.

Thanks!

Miles

Not applicable

Facing the same error with Oozie Shell action and Spark 1.6 - HDP 2.4. Any steps available to resolve this one?

New Contributor

@egarelnabi  Hi

Im trying this method and I'm also getting Heart Beat message continously.

Any idea ?

 

Thanks