Support Questions

Find answers, ask questions, and share your expertise

User impersonation in Spark and Samza

avatar
Master Guru

When user submits job via Spark/Samza to Yarn, job gets executed as "yarn" user, how can we make sure that job should run as same user who has submitted the job.

Please advise.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Kuldeep Kulkarni

I believe this needs to handled with kerberos, if not kerberized it will submit as yarn.

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

@Kuldeep Kulkarni

I believe this needs to handled with kerberos, if not kerberized it will submit as yarn.

avatar
Master Guru

@Ian Roberts

I believe we can do something like this:

For example if you are running spark shell then you can add below configurations in core-site.xml and run your job with --proxy-user <username>

<property> 
<name>hadoop.proxyuser.<username>.hosts</name> 
<value>*</value> 
</property> 

<property> 
<name>hadoop.proxyuser.<username>.groups</name> 
<value>*</value> 
</property> 
Command to run spark shell with YARN with proxy user:
spark-shell --master yarn-client --proxy-user <username>

avatar
Explorer

It didn't work for me. Am getting below exception

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): <proxyuser> tries to renew a token with renewer <loggeduser>

avatar
Contributor

Is possible to follow above approach in Kerberos environment? I tried above step to run job as proxy user but it failed. Got GSS initialization exception. Any pointers?

avatar

note that even when running as OS user "yarn", an environment variable, "HADOOP_USER_NAME" passes the name of the account submitting the work into that process, which is then picked up by the HDFS client: the code should be able to work with HDFS directories as the submitter, with the same permissions and things. That is, as you may have guessed, completely insecure and open to abuse —for that you need to make the leap to Kerberos, I'm afraid.