Created on 04-26-2019 07:35 AM - edited 09-16-2022 07:20 AM
The documentation for CDH 5.9 talks about the --principal, --keytab, and --proxy-user arguments to spark-submit. However, the newer versions of that same doc page don't even mention these options anymore (CDH 5.10, CDH 5.11, CDH 6.2). I have read conflicting things about how to use these options from various sources, so am trying to get the definitive explanation of them, if you will. Where are these options documented in the newer CDH versions? Thanks.
Created 04-28-2019 12:06 AM
There are som changes in the documentation, and we have similar statements on a new page, that is more specific for long running spark on YARN jobs in cluster mode:
https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_sg_yarn_long_jobs.html
For the jobs run less than 7 days ( that is default life time of a ticket), you should be able to just login to KDC using the "kinit" command, and run the job.
Created 04-29-2019 09:18 AM
Created 03-08-2021 03:40 AM
Hi Yuexin
Did you manage to get any resolution for this? I am able to run spark job as a --proxy-user under yarn cluster mode. However I can successfully run using yarn-client mode.
This is when using CDH 6.2.1 version of Spark.
There is no problem when using opensource version of Spark with --proxy-user either on client or cluster mode.
Thanks