Support Questions
Find answers, ask questions, and share your expertise

Kerberos error while authenticating with KMS endpoint in pyspark

Expert Contributor

Hello,

I am getting an error after upgrade of CDH5.16 to CDP7.1.7. The logs show it is unable to connect to the KMS endpoint. First start a Spark session in sparkmagic then I run below example pyspark code:

 

Starting Spark application
ID YARN Application ID Kind State Spark UI Driver log Current session?
23application_1639802810085_6070pysparkidleLinkLink
 
SparkSession available as 'spark'.

###############sample pyspark code###################

from pyspark.sql import SparkSession

spark = SparkSession.builder.master('local').getOrCreate()

# load data from .csv file in HDFS
# tips = spark.read.csv("/user/hive/warehouse/tips/", header=True, inferSchema=True)

# OR load data from table in Hive metastore
tips = spark.table('db1.table1')

from pyspark.sql.functions import col, lit, mean

# query using DataFrame API
#tips \

# query using SQL
spark.sql("select name from db1.table1").show(3)


spark.stop()

 

 

An error occurred while calling o85.showString.
: java.io.IOException: java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051)
	at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:255)
	at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:252)
	at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175)
	at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.getDelegationToken(LoadBalancingKMSClientProvider.java:252)

Caused by: java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1916)
	at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1029)
	... 67 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: http://kmshostxyz.com:16000/kms/v1/?op=GETDELEGATIONTOKEN&renewer=yarn%2Fyarnhost%40KERBEROSREALM
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:237)


at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
	... 68 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:365)
	at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
	... 78 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)

 

	... 79 more

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 381, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o85.showString.
: java.io.IOException: java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051)

 

There is also an old related thread but no resolution:

https://community.cloudera.com/t5/Support-Questions/Not-able-to-access-the-files-in-HDFS-encryption-...

2 REPLIES 2

@ebeb Are you able to kinit from the host with the existed keytab? That’s a vlid check to start with and then see where is the issue. 


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Expert Contributor
Hello @GangWar ,
Yes I can do kinit -kt for all the userids including yarn, livy and own userid from the same server.
; ;