Support Questions
Find answers, ask questions, and share your expertise

Running spark2 job through Oozie shell action?


Hi all,

As mentioned from the title i'm trying to run a shell action that kicks off a spark job but unfortunately i'm consistently getting the following error...

19/05/10 14:03:39 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Could not set up IO Streams to <hbaseregionserver>
 Fri May 10 14:03:39 BST 2019, RpcRetryingCaller{globalStartTime=1557493419339, pause=100, retries=2}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: <hbaseregionserver>

Been playing around trying to get the script to take in the kerberos ticket but having no luck, as far as I can tell its related to the Oozie job not being able to pass the kerberos ticket any ideas why its not picking it up? I'm at a loss? Related code is below

Oozie workflow action

 <action name="sparkJ" cred="hive2Cred">
        <shell xmlns="uri:oozie:shell-action:0.1">
                <capture-output />
                <ok to="end" />
                <error to="fail" />

Shell script
export job_name=SPARK_JOB

export num_executors=10
export executor_memory=1G
export queue=YARNQ
export max_executors=50
echo "[[[[[[[[[[[[[ Starting Job - name:${job_name}, configuration:${configuration} ]]]]]]]]]]]]]]"

/usr/hdp/current/spark2-client/bin/spark-submit \
--name ${job_name} \
--driver-java-options "-Dlog4j.configuration=file:./" \
--num-executors ${num_executors} \
--executor-memory ${executor_memory} \
--master yarn \
--keytab KEYTAB \
--principal KPRINCIPAL \
--supervise \
--deploy-mode cluster \
--queue ${queue} \
--files "./${configuration},./hbase-site.xml,./" \
--conf spark.driver.extraClassPath="/usr/hdp/current/hive-client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
--conf spark.executor.extraJavaOptions=" -Dlog4j.configuration=file:./"  \
--conf spark.executor.extraClassPath="/usr/hdp/current/hive-client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
--conf spark.streaming.stopGracefullyOnShutdown=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=${max_executors} \
--conf spark.streaming.concurrentJobs=2 \
--conf spark.streaming.backpressure.enabled=true \
--conf \
--conf \
--conf spark.streaming.kafka.maxRatePerPartition=5000 \
--conf \
--conf \
--conf spark.streaming.backpressure.initialRate=5000 \
--jars /usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar \
--class myclass myjar.jar ./${configuration}

Many thanks to any help you can provide.


The above was originally posted in the Community Help track. On Sat May 11 02:53 UTC 2019, the HCC moderation staff moved it to the Security track. The Community Help track is appropriate for questions about using the HCC Community site itself.

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.