Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running spark2 job through Oozie shell action?

Running spark2 job through Oozie shell action?

Contributor

Hi all,


As mentioned from the title i'm trying to run a shell action that kicks off a spark job but unfortunately i'm consistently getting the following error...


19/05/10 14:03:39 ERROR AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
java.io.IOException: Could not set up IO Streams to <hbaseregionserver>
 Fri May 10 14:03:39 BST 2019, RpcRetryingCaller{globalStartTime=1557493419339, pause=100, retries=2}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: <hbaseregionserver>
 


Been playing around trying to get the script to take in the kerberos ticket but having no luck, as far as I can tell its related to the Oozie job not being able to pass the kerberos ticket any ideas why its not picking it up? I'm at a loss? Related code is below


Oozie workflow action

 <action name="sparkJ" cred="hive2Cred">
        <shell xmlns="uri:oozie:shell-action:0.1">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <configuration>
                        <property>
                                <name>mapred.job.queue.name</name>
                                <value>${oozieQueueName}</value>
                        </property>
                </configuration>
                <exec>run.sh</exec>
                <file>/thePathToTheScript/run.sh#run.sh</file>
                <file>/thePathToTheProperties/myp.properties#myp.properties</file>
                <capture-output />
        </shell>
                <ok to="end" />
                <error to="fail" />
        </action>



Shell script
#!/bin/sh
export job_name=SPARK_JOB
export configuration=myp.properties

export num_executors=10
export executor_memory=1G
export queue=YARNQ
export max_executors=50
kinit -kt KEYTAB KPRINCIPAL
echo "[[[[[[[[[[[[[ Starting Job - name:${job_name}, configuration:${configuration} ]]]]]]]]]]]]]]"

/usr/hdp/current/spark2-client/bin/spark-submit \
--name ${job_name} \
--driver-java-options "-Dlog4j.configuration=file:./log4j.properties" \
--num-executors ${num_executors} \
--executor-memory ${executor_memory} \
--master yarn \
--keytab KEYTAB \
--principal KPRINCIPAL \
--supervise \
--deploy-mode cluster \
--queue ${queue} \
--files "./${configuration},./hbase-site.xml,./log4j.properties" \
--conf spark.driver.extraClassPath="/usr/hdp/current/hive-client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
--conf spark.executor.extraJavaOptions="-Djava.security.auth.login.config=./jaas.conf -Dlog4j.configuration=file:./log4j.properties"  \
--conf spark.executor.extraClassPath="/usr/hdp/current/hive-client/lib/datanucleus-*.jar:/usr/hdp/current/tez-client/*.jar" \
--conf spark.streaming.stopGracefullyOnShutdown=true \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=${max_executors} \
--conf spark.streaming.concurrentJobs=2 \
--conf spark.streaming.backpressure.enabled=true \
--conf spark.yarn.security.tokens.hive.enabled=true \
--conf spark.yarn.security.tokens.hbase.enabled=true \
--conf spark.streaming.kafka.maxRatePerPartition=5000 \
--conf spark.streaming.backpressure.pid.maxRate=3000 \
--conf spark.streaming.backpressure.pid.minRate=200 \
--conf spark.streaming.backpressure.initialRate=5000 \
--jars /usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar \
--class myclass myjar.jar ./${configuration}


Many thanks to any help you can provide.

1 REPLY 1

Re: Running spark2 job through Oozie shell action?

Community Manager

The above was originally posted in the Community Help track. On Sat May 11 02:53 UTC 2019, the HCC moderation staff moved it to the Security track. The Community Help track is appropriate for questions about using the HCC Community site itself.