Support Questions

Find answers, ask questions, and share your expertise

YARN force nobody user on all jobs (and so they fail)

avatar
Explorer

Hi,
I am having some problems with YARN and it is not the first clusters where this happens, so I don't get what I am doing wrong. Every night I shut down the clusters (installed on AWS and SoftLayer) to not spend money while not working. Also, sooner or later I need bigger machines, so I change the AWS instance type (similar name also for SoftLayer). What happens in a not very clear moment is that after a particular restart YARN generates problems in the NodeManager user cache directory (e.g. /bigdata1/yarn/nm/usercache/m.giusto), like in this case (https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-Error-Sqoop-2-not-working-through... and I am forced to remove everything from all the user cache directories (acceptable) otherwise jobs are unable to start.
However the bigger problem is that YARN also starts applying a not desired rule for which each user that submit a job is considered not allowed and YARN starts the job as "nobody" (yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user default value). This happens for a not super-user like "m.giusto" (UID over 1000) and also for "hdfs" (UID less than 500). I have tried to move "hdfs" from "banned.users" to "allowed.system.users" and to set "min.user.id" to 0 , no changes. Moreover "nobody" user is not able to write to the real-user user cache folder (permission denied) and so the job fails.

 

main : user is nobody
main : requested yarn user is m.giusto
Can't create directory /bigdata1/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Can't create directory /bigdata2/yarn/nm/usercache/m.giusto/appcache/application_1427799738120_0001 - Permission denied
Did not create any app directories
.Failing this attempt.. Failing the application.

 

What I do not get is why the system starts applying these rules and how to fix. At the moment the only solution is to reinstall the cluster..

 

Some other infos: OS is Centos6.6, tested CDH version are 5.2.1, 5.3.1 and 5.3.2.

 


Thanks,
Michele

1 ACCEPTED SOLUTION

avatar
Mentor
You can force the LCE to impersonate actual users in non-secure mode
by turning "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users"
to false (default is true, which causes it to limit all users to
"nobody"). Note though that enabling this ability requires that your
users' accounts exist on all nodes in the cluster at the Unix level
(i.e. id must return a valid ID for their jobs to work).

If you instead just wanted to change the "nobody" user to some other
static user, the config for that is
"yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user"
(default: "nobody"). If you go this way, you need to leave the earlier
mentioned config as its default of true.

These configs need to be changed in CM's YARN configuration page
(either via direct fields if available, or via the yarn-site.xml), and
they must reach the NodeManager's configs to apply.

These configs are also documented in the yarn-default.xml:
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Does this help?

View solution in original post

11 REPLIES 11

avatar
Contributor

Hi Michele

 

It sounds like YARN is running under the linux container executor which is used to provide secure containers or resource isolation (using cgroups). If you don't need these features or if this was enabled accidentally then you can probably fix the problem by unchecking "Always Use Linux Container Executor" in YARN configuration under Cloudera Manager.

 

Or if you do need LCE, then one thing to check is that your local users (e.g. m.giusto) exist on all nodes . 

 

Dave

avatar
Explorer

Hi Dave,
thanks for the quick answer. You are right, the problem is that it is enabled the flag of "Always Use Linux Container Executor". I have unchecked it and now things seems to be working.

However, the description on Cloudera Manager of the "Always Use Linux Container Executor" flag says "Cgroups enforcement only works when the Linux Container Executor is used", so if I want to use the desired "Static Resource Pool" where YARN gets only X% of the resources, I have to maintain the flag enabled (now I also understand when the flag gets checked, after making the first configuration of the resource pool...). So I have tried to install what needed for cgroups (libcgroup) and reenabled the flag.
Now if I execute YARN application (like Hive query) everything works. If instead I try to execute a Oozie job with a shell action inside, the shell action is executed by "nobody" user (real Oozie user "m.giusto"). Normally shell action are executed as "yarn", so I have added "yarn" in "allowed.system.users" and removed it from "banned.users". "nobody" user remains the MR user. Any idea?

 


Michele

avatar
Mentor
You can force the LCE to impersonate actual users in non-secure mode
by turning "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users"
to false (default is true, which causes it to limit all users to
"nobody"). Note though that enabling this ability requires that your
users' accounts exist on all nodes in the cluster at the Unix level
(i.e. id must return a valid ID for their jobs to work).

If you instead just wanted to change the "nobody" user to some other
static user, the config for that is
"yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user"
(default: "nobody"). If you go this way, you need to leave the earlier
mentioned config as its default of true.

These configs need to be changed in CM's YARN configuration page
(either via direct fields if available, or via the yarn-site.xml), and
they must reach the NodeManager's configs to apply.

These configs are also documented in the yarn-default.xml:
http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Does this help?

avatar
Explorer

Hi Harsh,

thanks for the reply. I didn't know about the "yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users" property. I will explore the two opportunities in order to choose the better one for the current experiments.

 

 

Thanks,

Michele

 

avatar
Expert Contributor

Hi,

 

I have followed the instruction to allow Linux user to run the container as the user which lunch the application, but still, I can see the actual user is nobody. Even though I can see this:

main : run as user is nobody
main : requested yarn user is panahi

I have OpenLDAP sync users and groups in all the nodes across the cluster. My only problem is the yarn containers are launched either by yarn in a non-secure cluster with default values, or nobody when you change "Limit Nonsecure Container Executor Users" to false, and "yarn.nodemanager.container-executor.class" to true.

 

Despite the fact Spark runs with the user which runs it, the result of this snippet which Spark calls another application is always either yarn or nobody in any situation:

 

val test = sc.parallelize(Seq("test user")).repartition(1)
val piped = test.pipe(Seq("whoami"))
val c = piped.collect()

 

avatar
Mentor
@maziyar,

Are you sure you have "Always Use Linux Container Executor" checked and
"Limit Nonsecure Container Executor Users" unchecked, and no safety valves
overriding relevant properties? What CDH and CM version are you running?

avatar
Expert Contributor

Hi @Harsh J

 

Thanks for your response. Yes, these are the two configs you mentioned, and also I checked all the "safety valves" there is nothing related to any Linux or cgroups:

 

image.png

image.png

 

I have even remove the "nobody" user from allowed and left the "nonsecure-mode.lcaol-user" empty, but still says "nobody". If I revert all the changes, it says "yarn". So these configs impact something somewhere.

 

Cloudera Express: 5.15.1

Java Version: 1.8.0_181

CDH: 5.15.1-1.cdh5.15.1.p0.4

 

 

UPDATE: one more thing that might be useful, when I download client configuration from CM, I can't find these two configs in any of the configs. Not sure if that is normal.

 

Best,

Maziyar

avatar
Expert Contributor

Hi @Harsh J

 

You mentioning the Safe Valve gave me an idea! I thought maybe the UI in CM is not setting one or both of those key/values. So I did this manually and it worked! Now every container asked by Spark Pipe() has the same owner as the Spark application itself (no more nobody or yarn! - there must be something with the UI that won't map one of those two configs back to yarn-site.xml):

 

image.png

avatar
Explorer

I had to fix this by changing the 'yarn' user to have a umask of '0'.

 

I would suggest adding this fix to the Cloudera Manager.

 

Cheers,

 

Lance Norskog