Support Questions

Find answers, ask questions, and share your expertise

YARN force nobody user on all jobs (and so they fail)

avatar
Explorer

Hi,
I am having some problems with YARN and it is not the first clusters where this happens, so I don't get what I am doing wrong. Every night I shut down the clusters (installed on AWS and SoftLayer) to not spend money while not working. Also, sooner or later I need bigger machines, so I change the AWS instance type (similar name also for SoftLayer). What happens in a not very clear moment is that after a particular restart YARN generates problems in the NodeManager user cache directory (e.g. /bigdata1/yarn/nm/usercache/m.giusto), like in this case (https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-Error-Sqoop-2-not-working-through... and I am forced to remove everything from all the user cache directories (acceptable) otherwise jobs are unable to start.
However the bigger problem is that YARN also starts applying a not desired rule for which each user that submit a job is considered not allowed and YARN starts the job as "nobody" (yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user default value). This happens for a not super-user like "m.giusto" (UID over 1000) and also for "hdfs" (UID less than 500). I have tried to move "hdfs" from "banned.users" to "allowed.system.users" and to set "min.user.id" to 0 , no changes. Moreover "nobody" user is not able to write to the real-user user cache folder (permission denied) and so the job fails.

 

main : user is nobody
main : requested yarn user is m.giusto
Can't create directory /bigdata1/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Can't create directory /bigdata2/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Did not create any app directories
.Failing this attempt.. Failing the application.

 

What I do not get is why the system starts applying these rules and how to fix. At the moment the only solution is to reinstall the cluster..

 

Some other infos: OS is Centos6.6, tested CDH version are 5.2.1, 5.3.1 and 5.3.2.

 


Thanks,
Michele

1 ACCEPTED SOLUTION

avatar
Mentor
You can force the LCE to impersonate actual users in non-secure mode
by turning "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users"
to false (default is true, which causes it to limit all users to
"nobody"). Note though that enabling this ability requires that your
users' accounts exist on all nodes in the cluster at the Unix level
(i.e. id must return a valid ID for their jobs to work).

If you instead just wanted to change the "nobody" user to some other
static user, the config for that is
"yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user"
(default: "nobody"). If you go this way, you need to leave the earlier
mentioned config as its default of true.

These configs need to be changed in CM's YARN configuration page
(either via direct fields if available, or via the yarn-site.xml), and
they must reach the NodeManager's configs to apply.

These configs are also documented in the yarn-default.xml:
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Does this help?

View solution in original post

11 REPLIES 11

avatar
New Contributor

Hey Lance,

 

I am also stuck with the same issue. Can you please tell how did you change the umask for yarn user.

 

Thanks,

Ankur

avatar
Explorer

I don't remember how I did this.