Does the spark streaming job itself need to log into Kerberos every so often? Or if the spark job is submitted once after kinit, it will continue to work fine, as long it is running?
Make sure you're on HDP 2.4.2 as that has a fix for long running spark jobs on kerberized cluster.
AFAIK we need to submit the spark streaming through spark-submit command along with --principal and --keytab options. Once we run the job, these keytab and principle will get replicate the nodemanager node through distributed cache where spark streaming AM is running and going further same AM will renew the Kerberos tickets when required using that keytab file. There was an issue with Kerberos token renew before spark 1.5.
Is creates a new token when it reaches 70% of expiry time.
Please go through below doc and jira for more technical design info.
@Jitendra Yadav - the jira you mentioned specifically referred to yarn / hdfs, but if I use the same principal and keytab would it work and re-new for long running streaming job that access e.g. hbase / hive? I presume they can and will use the same token?
In the Spark AM, the keytab is used to logon as the principal and use kerberos credentials to acquire new instances of all default tokens (HDFS/Hive/HBase) in HDP Spark 1.6. In Apache Spark Hbase is not acquired. There is a backport in HDP for that fix from Spark 2.0.