Reply
Explorer
Posts: 9
Registered: ‎08-05-2014
Accepted Solution

YARN force nobody user on all jobs (and so they fail)

Hi,
I am having some problems with YARN and it is not the first clusters where this happens, so I don't get what I am doing wrong. Every night I shut down the clusters (installed on AWS and SoftLayer) to not spend money while not working. Also, sooner or later I need bigger machines, so I change the AWS instance type (similar name also for SoftLayer). What happens in a not very clear moment is that after a particular restart YARN generates problems in the NodeManager user cache directory (e.g. /bigdata1/yarn/nm/usercache/m.giusto), like in this case (https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-Error-Sqoop-2-not-working-through... and I am forced to remove everything from all the user cache directories (acceptable) otherwise jobs are unable to start.
However the bigger problem is that YARN also starts applying a not desired rule for which each user that submit a job is considered not allowed and YARN starts the job as "nobody" (yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user default value). This happens for a not super-user like "m.giusto" (UID over 1000) and also for "hdfs" (UID less than 500). I have tried to move "hdfs" from "banned.users" to "allowed.system.users" and to set "min.user.id" to 0 , no changes. Moreover "nobody" user is not able to write to the real-user user cache folder (permission denied) and so the job fails.

 

main : user is nobody
main : requested yarn user is m.giusto
Can't create directory /bigdata1/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Can't create directory /bigdata2/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Did not create any app directories
.Failing this attempt.. Failing the application.

 

What I do not get is why the system starts applying these rules and how to fix. At the moment the only solution is to reinstall the cluster..

 

Some other infos: OS is Centos6.6, tested CDH version are 5.2.1, 5.3.1 and 5.3.2.

 


Thanks,
Michele

Cloudera Employee
Posts: 12
Registered: ‎03-31-2015

Re: YARN force nobody user on all jobs (and so they fail)

Hi Michele

 

It sounds like YARN is running under the linux container executor which is used to provide secure containers or resource isolation (using cgroups). If you don't need these features or if this was enabled accidentally then you can probably fix the problem by unchecking "Always Use Linux Container Executor" in YARN configuration under Cloudera Manager.

 

Or if you do need LCE, then one thing to check is that your local users (e.g. m.giusto) exist on all nodes . 

 

Dave

Explorer
Posts: 9
Registered: ‎08-05-2014

Re: YARN force nobody user on all jobs (and so they fail)

Hi Dave,
thanks for the quick answer. You are right, the problem is that it is enabled the flag of "Always Use Linux Container Executor". I have unchecked it and now things seems to be working.

However, the description on Cloudera Manager of the "Always Use Linux Container Executor" flag says "Cgroups enforcement only works when the Linux Container Executor is used", so if I want to use the desired "Static Resource Pool" where YARN gets only X% of the resources, I have to maintain the flag enabled (now I also understand when the flag gets checked, after making the first configuration of the resource pool...). So I have tried to install what needed for cgroups (libcgroup) and reenabled the flag.
Now if I execute YARN application (like Hive query) everything works. If instead I try to execute a Oozie job with a shell action inside, the shell action is executed by "nobody" user (real Oozie user "m.giusto"). Normally shell action are executed as "yarn", so I have added "yarn" in "allowed.system.users" and removed it from "banned.users". "nobody" user remains the MR user. Any idea?

 


Michele

Posts: 1,825
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: YARN force nobody user on all jobs (and so they fail)

You can force the LCE to impersonate actual users in non-secure mode
by turning "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users"
to false (default is true, which causes it to limit all users to
"nobody"). Note though that enabling this ability requires that your
users' accounts exist on all nodes in the cluster at the Unix level
(i.e. id must return a valid ID for their jobs to work).

If you instead just wanted to change the "nobody" user to some other
static user, the config for that is
"yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user"
(default: "nobody"). If you go this way, you need to leave the earlier
mentioned config as its default of true.

These configs need to be changed in CM's YARN configuration page
(either via direct fields if available, or via the yarn-site.xml), and
they must reach the NodeManager's configs to apply.

These configs are also documented in the yarn-default.xml:
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Does this help?

Explorer
Posts: 9
Registered: ‎08-05-2014

Re: YARN force nobody user on all jobs (and so they fail)

Hi Harsh,

thanks for the reply. I didn't know about the "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" property. I will explore the two opportunities in order to choose the better one for the current experiments.

 

 

Thanks,

Michele

 

New Contributor
Posts: 4
Registered: ‎12-14-2015

Re: YARN force nobody user on all jobs (and so they fail)

[ Edited ]

I had to fix this by changing the 'yarn' user to have a umask of '0'.

 

I would suggest adding this fix to the Cloudera Manager.

 

Cheers,

 

Lance Norskog

 

New Contributor
Posts: 2
Registered: ‎05-09-2016

Re: YARN force nobody user on all jobs (and so they fail)

Hey Lance,

 

I am also stuck with the same issue. Can you please tell how did you change the umask for yarn user.

 

Thanks,

Ankur

New Contributor
Posts: 4
Registered: ‎12-14-2015

Re: YARN force nobody user on all jobs (and so they fail)

I don't remember how I did this.

Expert Contributor
Posts: 64
Registered: ‎11-04-2016

Re: YARN force nobody user on all jobs (and so they fail)

Hi,

 

I have followed the instruction to allow Linux user to run the container as the user which lunch the application, but still, I can see the actual user is nobody. Even though I can see this:

main : run as user is nobody
main : requested yarn user is panahi

I have OpenLDAP sync users and groups in all the nodes across the cluster. My only problem is the yarn containers are launched either by yarn in a non-secure cluster with default values, or nobody when you change "Limit Nonsecure Container Executor Users" to false, and "yarn.nodemanager.container-executor.class" to true.

 

Despite the fact Spark runs with the user which runs it, the result of this snippet which Spark calls another application is always either yarn or nobody in any situation:

 

val test = sc.parallelize(Seq("test user")).repartition(1)
val piped = test.pipe(Seq("whoami"))
val c = piped.collect()

 

Posts: 1,825
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: YARN force nobody user on all jobs (and so they fail)

@maziyar,

Are you sure you have "Always Use Linux Container Executor" checked and
"Limit Nonsecure Container Executor Users" unchecked, and no safety valves
overriding relevant properties? What CDH and CM version are you running?
Announcements