Support Questions

mtrepanier · ‎02-01-2019

I am submitting a spark job and setting both the spark.yarn.keytab and spark.yarn.principal values. The logs indicate that these variables are being set correctly:

2019-01-28 16:48:45 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:45 INFO Client: Attempting to login to the Kerberos using principal: USERNAME and keytab: /home/USERNAME/USERNAME.keytab

2019-01-28 16:48:58 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:58 INFO HadoopFSCredentialProvider: getting token for: hdfs://nameservice1/user/USERNAME

However, after about 8 hours of running, I receive the below exception related to not having a valid kerberos ticket.

Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)

at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)

at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)

at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)

at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)

at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)

at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)

... 27 more

The job is composed of 3 parts: NLP, indexing the data, and writing to parquet. Both the NLP and indexing stages complete, while the exception occurs during the parquet write. I was under the impression that when using a keytab, the ticket should be valid for the duration of the job. Is this not the case?

(The job is being submitted using SparkLauncher and pointing to a jar. This is essentially the same as using spark-submit.)

David_Schwab · ‎02-01-2019

The Spark documentation notes that:

Long-running applications may run into issues if their run time exceeds the maximum delegation token lifetime configured in services it needs to access.

You should check if delegation is enabled and if the maximum token lifetime is set to something less than the time it takes to run your job.

mtrepanier · ‎02-01-2019

@David_Schwab it was my understanding that when submitting a job with a keytab, the spark Application Master would periodically renew the ticket using the principal and keytab, as per:

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cm_sg_yarn_long_jobs.html

Could it be possible that the ticket refresh rate is longer than that of the maximum ticket life?

Cloudera Community

Support Questions

Spark Kerberos Ticket Timing out Despite Providing Principal and Keytab