Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Kerberos Ticket Timing out Despite Providing Principal and Keytab

avatar
Rising Star

I am submitting a spark job and setting both the spark.yarn.keytab and spark.yarn.principal values. The logs indicate that these variables are being set correctly:

 

 

2019-01-28 16:48:45 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:45 INFO Client: Attempting to login to the Kerberos using principal: USERNAME and keytab: /home/USERNAME/USERNAME.keytab
2019-01-28 16:48:58 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:58 INFO HadoopFSCredentialProvider: getting token for: hdfs://nameservice1/user/USERNAME
 
However, after about 8 hours of running, I receive the below exception related to not having a valid kerberos ticket.
 
 
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 27 more
 
 
The job is composed of 3 parts: NLP, indexing the data, and writing to parquet. Both the NLP and indexing stages complete, while the exception occurs during the parquet write. I was under the impression that when using a keytab, the ticket should be valid for the duration of the job. Is this not the case?
 
(The job is being submitted using SparkLauncher and pointing to a jar. This is essentially the same as using spark-submit.)
2 REPLIES 2

avatar
The Spark documentation notes that:

Long-running applications may run into issues if their run time exceeds the maximum delegation token lifetime configured in services it needs to access.

You should check if delegation is enabled and if the maximum token lifetime is set to something less than the time it takes to run your job.

avatar
Rising Star

@David_Schwab it was my understanding that when submitting a job with a keytab, the spark Application Master would periodically renew the ticket using the principal and keytab, as per:

 

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cm_sg_yarn_long_jobs.html

 

Could it be possible that the ticket refresh rate is longer than that of the maximum ticket life?