Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark-submit failing to connect to metastore due to Kerberos : Caused by GSSException: No valid credentials provided . but works in local-client mode

spark-submit failing to connect to metastore due to Kerberos : Caused by GSSException: No valid credentials provided . but works in local-client mode

New Contributor

it seems, in docker pyspark (2.3.0) shell in local-client mode is working and able to connect to hive. However, issuing spark-submit with all dependencies it fails with below error:

20/08/24 14:03:01 INFO storage.BlockManagerMasterEndpoint: Registering block manager test.server.com:41697 with 6.2 GB RAM, BlockManagerId(3, test.server.com, 41697, None)
20/08/24 14:03:02 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
20/08/24 14:03:02 INFO hive.metastore: Trying to connect to metastore with URI thrift://metastore.server.com:9083
20/08/24 14:03:02 ERROR transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)

Running a simple pi example on spark-submit yarn-cluster mode through pysparkscript works fine with no kerberos issues, but when trying to access hive metastore getting kerberos error.

 

Spark-submit command:

spark-submit --master yarn --deploy-mode cluster --files=/etc/hive/conf/hive-site.xml,/etc/hive/conf/yarn-site.xml,/etc/hive/conf/hdfs-site.xml,/etc/hive/conf/core-site.xml,/etc/hive/conf/mapred-site.xml,/etc/hive/conf/ssl-client.xml  --name fetch_hive_test --executor-memory 12g --num-executors 20 test_hive_minimal.py

test_hive_minimal.py is a simple pyspark script to show tables in test db:

from pyspark.sql import SparkSession
appName = "test_hive_minimal"
master = "yarn" sc = SparkSession.builder \ .appName(appName) \ .master(master) \ .enableHiveSupport() \ .config("spark.hadoop.hive.enforce.bucketing", "True") \ .config("spark.hadoop.hive.support.quoted.identifiers", "none") \ .config("hive.exec.dynamic.partition", "True") \ .config("hive.exec.dynamic.partition.mode", "nonstrict") \ .getOrCreate() sql = "show tables in user_tables"
df_new = sc.sql(sql)
df_new.show()
sc.stop()

 

Can anyone throw some light how to fix this?the keytab is fine because hadoop can be accessed from the docker terminal. Isnt kerberos tickets managed automatically by yarn? I tried passing keytab and principal but it did not help. What seems to be the issue here?

 

this is cdh5.13 with spark 2.3 

 

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here