I am submitting a spark job and setting both the spark.yarn.keytab and spark.yarn.principal values. The logs indicate that these variables are being set correctly:
2019-01-28 16:48:45 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:45 INFO Client: Attempting to login to the Kerberos using principal: USERNAME and keytab: /home/USERNAME/USERNAME.keytab
2019-01-28 16:48:58 +0000 [INFO] from org.apache.spark.launcher.app.MAINCLASS in launcher-proc-1 - 19/01/28 16:48:58 INFO HadoopFSCredentialProvider: getting token for: hdfs://nameservice1/user/USERNAME
However, after about 8 hours of running, I receive the below exception related to not having a valid kerberos ticket.
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 27 more
The job is composed of 3 parts: NLP, indexing the data, and writing to parquet. Both the NLP and indexing stages complete, while the exception occurs during the parquet write. I was under the impression that when using a keytab, the ticket should be valid for the duration of the job. Is this not the case?
(The job is being submitted using SparkLauncher and pointing to a jar. This is essentially the same as using spark-submit.)