Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Highlighted

What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Expert Contributor

Does the spark streaming job itself need to log into Kerberos every so often? Or if the spark job is submitted once after kinit, it will continue to work fine, as long it is running?

9 REPLIES 9

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Mentor

Make sure you're on HDP 2.4.2 as that has a fix for long running spark jobs on kerberized cluster.

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Expert Contributor

Thanks @Artem Ervits could you please point to the Apache Jira? Also, does it perform renewal or kinit based on corporate policy allowances to renew and password / keytab change?

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Mentor

@Saumil Mayani please review this slideshare from @vshukla for the requested information http://www.slideshare.net/HadoopSummit/running-spark-in-production-61337353 including examples and JIRAs

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

@Saumil Mayani

AFAIK we need to submit the spark streaming through spark-submit command along with --principal and --keytab options. Once we run the job, these keytab and principle will get replicate the nodemanager node through distributed cache where spark streaming AM is running and going further same AM will renew the Kerberos tickets when required using that keytab file. There was an issue with Kerberos token renew before spark 1.5.

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Expert Contributor

@Jitendra Yadav does it perform renewal or kinit based on corporate policy allowances to renew and password / keytab changes?

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

@Saumil Mayani

Is creates a new token when it reaches 70% of expiry time.

Please go through below doc and jira for more technical design info.

https://docs.google.com/document/d/1ECBZTprOEHPueXcG-w3GibpoWgLccHJwU62pNxYM5oU/edit#

https://issues.apache.org/jira/browse/SPARK-5342

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Expert Contributor

@Jitendra Yadav - the jira you mentioned specifically referred to yarn / hdfs, but if I use the same principal and keytab would it work and re-new for long running streaming job that access e.g. hbase / hive? I presume they can and will use the same token?

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

@David Tam Sorry don't know how does spark AM token renew functionality differ in case of hbase/hive.

Re: What is correct strategy for Spark Streaming Kerberos Login For Long running applications?

Expert Contributor

In the Spark AM, the keytab is used to logon as the principal and use kerberos credentials to acquire new instances of all default tokens (HDFS/Hive/HBase) in HDP Spark 1.6. In Apache Spark Hbase is not acquired. There is a backport in HDP for that fix from Spark 2.0.