Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Where is the documentation for spark-submit proxy user and keytab options post CDH 5.9?

Explorer

The documentation for CDH 5.9 talks about the --principal, --keytab, and --proxy-user arguments to spark-submit.  However, the newer versions of that same doc page don't even mention these options anymore (CDH 5.10, CDH 5.11, CDH 6.2).  I have read conflicting things about how to use these options from various sources, so am trying to get the definitive explanation of them, if you will.  Where are these options documented in the newer CDH versions?  Thanks.

3 REPLIES 3

Contributor

There are som changes in the documentation, and we have similar statements on a new page, that is more specific for long running spark on YARN jobs in cluster mode:

 

https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_sg_yarn_long_jobs.html

 

For the jobs run less than 7 days ( that is default life time of a ticket), you should be able to just login to KDC using the "kinit" command, and run the job. 

Explorer

Thanks for the reponse.  Where is the equivalent "Configuring Spark on YARN for Long-Running Applications" page for CDH 6.1.x or latest?  I am not seeing it here or here.  Also, what about --proxy-user?  Is that still a supported parameter in CDH 5.10 and later?

New Contributor

Hi Yuexin

 

Did you manage to get any resolution for this? I am able to run spark job as a --proxy-user under yarn cluster mode. However I can successfully run using yarn-client mode.

This is when using CDH 6.2.1 version of Spark.

 

There is no problem when using opensource version of Spark with --proxy-user either on client or cluster mode.

 

Thanks

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.