Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Write to Phoenix Table is failing through Oozie Spark2 Action

Highlighted

Write to Phoenix Table is failing through Oozie Spark2 Action

Expert Contributor

Hi Team,

I have a Spark job which write to Phoenix table. I am able to run the job using spark-submit command but when I schedule the job using oozie spark action, it is failing.

spark-submit --jars /usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar --conf spark.speculation=false --master yarn --deploy-mode=cluster --conf spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.5.1052-6.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar --conf spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.5.1052-6.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar --conf spark.yarn.queue=default --files "/home/shyam/test.properties,/etc/spark2/conf/hbase-site.xml" --py-files test.zip model_main.py 

Extra jars requires to run the job

/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar

Error Message in log

Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:679) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:637) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:745) ... 17 more 
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) 
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) 
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734) 

I am passing the keytab in workflow.xml as <file> tag. Extra jars I am passing in <spark-opts>

<action name="model_main" cred="hive_auth">
        <spark xmlns="uri:oozie:spark-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>${mode}</mode>
            <name>Model</name>
            <jar>model_main.py</jar>
            <spark-opts>${sparkOpts}</spark-opts>
            <file>${nameNode}${filePath}/model_main.py#model_main.py</file>
            <file>${filePath}/dist#dist</file>
            <file>${filePath}/test.properties#test.properties</file>
            <file>${filePath}/shyam.keytab#shyam.keytab</file>
            <file>${filePath}/hbase-site.xml#hbase-site.xml</file>
        </spark>
nameNode=hdfs://testhdp
filePath=/user/shyam/code/oozie_source/model_test
oozie.wf.application.path=${nameNode}${filePath}
jobTracker=yarn-cluster
master=yarn
mode=cluster
oozie.action.sharelib.for.spark=spark2
oozie.use.system.libpath=true
workflow_name=model
queueName=default
user=shyam
kerb_auth=keytab
python_path=/opt/anaconda2/envs/python3/bin/python3
phoenix_jars=/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.5.1052-6.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar
libDir=${nameNode}${filePath}/lib
sparkLib=${libDir}/hbase-client.jar,${libDir}/hbase-common.jar,${libDir}/hbase-server.jar,${libDir}/hbase-protocol.jar,${libDir}/hive-hbase-handler.jar
#hbase_jars=/usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar
sparkOpts=--jars file:///usr/hdp/current/hbase-client/lib/hbase-client.jar,file:///usr/hdp/current/hbase-client/lib/hbase-common.jar,file:///usr/hdp/current/hbase-client/lib/hbase-server.jar,file:///usr/hdp/current/hbase-client/lib/hbase-protocol.jar,file:///usr/hdp/current/hive-client/lib/hive-hbase-handler.jar --master yarn --deploy-mode cluster --conf spark.executor.extraClassPath=${phoenix_jars} --conf spark.driver.extraClassPath=${phoenix_jars} --conf spark.yarn.queue=${queueName} --conf spark.executorEnv.PYSPARK_PYTHON=${python_path} --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${python_path} --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=${python_path} --files "test.properties,hbase-site.xml,shyam.keytab" --py-files dist/*.egg

I tried to place the dependent jars in lib folder of application path and then pass it to <spark-opts> but that didn't worked. It failed with java.lang.IllegalStateException: Variable substitution depth too large: 20

java.lang.IllegalStateException: Variable substitution depth too large: 20 --jars ${sparkLib} --conf spark.speculation=false --master yarn --deploy-mode cluster --conf spark.executor.extraClassPath=${phoenix_jars} --conf spark.driver.extraClassPath=${phoenix_jars} --conf spark.yarn.queue=${queueName} --conf spark.executorEnv.PYSPARK_PYTHON=${python_path} --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${python_path} --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=${python_path} --files "test.properties,hbase-site.xml,shyam.keytab" --py-files dist/*.egg

Any help on this.