Created on 08-15-2016 11:55 PM
Because spark action in oozie is not supported in HDP 2.3.x and HDP 2.4.0, there is no workaround especially in kerberos environment. We can use either java action or shell action to launch spark job in oozie workflow. In this article, we will discuss how to use oozie shell action to run a spark job in kerberos environment.
Prerequisite:
1. Spark client is installed on every host where nodemanager is running. This is because we have no control over which node the
2. Optionally, if the spark job need to interact with hbase cluster, hbase client need to be installed on every host as well.
Steps:
1. Create a shell script with the spark-submit command. For example, in the script.sh:
/usr/hdp/current/spark-client/bin/spark-submit --keytab keytab --principal ambari-qa-falconJ@FALCONJSECURE.COM --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 500m --num-executors 1 --executor-memory 500m --executor-cores 1 spark-examples.jar 3
2. Prepare kerberos keytab which will be used by the spark job. For example, we use ambari smoke test user, the keytab is already generated by Ambari in/etc/security/keytabs/smokeuser.headless.keytab.
3. Create the oozie workflow with a shell action which will execute the script created above, for example, in the workflow.xml:
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.4"> <start to="shellAction"/> <action name="shellAction"> <shell xmlns="uri:oozie:shell-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>script.sh</exec> <file>/user/oozie/shell/script.sh#script.sh</file> <file>/user/oozie/shell/smokeuser.headless.keytab#keytab</file> <file>/user/oozie/shell/spark-examples.jar#spark-examples.jar</file> <capture-output/> </shell> <ok to="end"/> <error to="killAction"/> </action> <kill name="killAction"> <message>"Killed job due to error"</message> </kill> <end name="end"/> </workflow-app>
4. Create the oozie job properties file. For example, in job.properties:
nameNode=hdfs://falconJ1.sec.support.com:8020
jobTracker=falconJ2.sec.support.com:8050
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/shell
oozie.use.system.libpath=true
5. Upload the following files created above to the oozie workflow application path in HDFS (In this example: /user/oozie/shell):
- workflow.xml
- smokeuser.headless.keytab
- script.sh
- spark uber jar (In this example: /usr/hdp/current/spark-client/lib/spark-examples*.jar)
- Any other configuration file mentioned in workflow (optional)
6. Execute the oozie command to run this workflow. For example:
oozie job -oozie http://<oozie-server>:11000/oozie -config job.properties -run
*This article was created by Hortonworks Support on 2016-04-28
Created on 10-28-2016 03:25 AM
when I try this with spark2.0 I got error:
[AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1474966402164_0092_m_000000_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Created on 10-31-2016 09:56 PM
do you have detailed log for this job? @kevin shen
Created on 11-01-2016 01:24 AM
Created on 11-17-2016 01:07 PM
Yes It is true. For details check for HDP 2.4.0 release notes
Created on 01-17-2017 04:55 AM
Hi Eyad,
I'm trying to execute a Spark2 action using the Shell Action in Oozie. I've tried the exact same steps as above but I'm stuck at the point below:
It just keeps on printing this forever in the stdout logs of the Oozie Launcher:
>>> Invoking Shell command line now >> Stdoutput Testing Shell Action Heart beat Heart beat Heart beat Heart beat
There is no error also, please suggest what am I doing wrong?
Created on 01-19-2017 09:33 PM
Can you elaborate a bit on how to set up the environment properly in the shell wrapper before calling spark-submit? Which login to get the action to run as? (owner/yarn/spark/oozie)
We've had a lot of problems getting the setup right when we implemented shell actions that wrap Hive queries (to process query output). spark-submit itself is a shell wrapper that does a lot of environment initialization, so I imagine it won't be smooth.
Thanks!
Miles
Created on 04-22-2017 08:38 AM
Facing the same error with Oozie Shell action and Spark 1.6 - HDP 2.4. Any steps available to resolve this one?
Created on 01-20-2021 07:15 AM
@egarelnabi Hi
Im trying this method and I'm also getting Heart Beat message continously.
Any idea ?
Thanks