Created on 02-14-2017 05:34 AM - edited 09-16-2022 04:05 AM
Hi,
I am trying to build Kerberos-enabled clusters using Cloudera Director. During FirstRun pretty much all services come online except YARN. HDFS, Hue, Zookeeper and Kafka are all fine.
When bringing up the nodemanager I see the following in the role logs:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:251) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591) Caused by: java.io.IOException: Linux container executor not configured properly (error=24) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:198) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:249) ... 3 more Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192) ... 4 more
Looking onto this node I see:
[root@dn0 ~]# find / -name container-executor.cfg -exec ls -l {} \; -rw-r--r--. 13 cloudera-scm cloudera-scm 318 Jan 20 21:38 /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/etc/hadoop/conf.empty/container-executor.cfg -r--------. 1 root hadoop 156 Feb 14 12:13 /run/cloudera-scm-agent/process/52-yarn-NODEMANAGER/container-executor.cfg -r--------. 1 root hadoop 156 Feb 14 12:13 /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg find: ‘/proc/17426’: No such file or directory [root@dn0 ~]# ll /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/ total 52 -rw-r--r--. 1 root root 20 Feb 14 12:10 __cloudera_generation__ -rw-r--r--. 1 root root 67 Feb 14 12:10 __cloudera_metadata__ -r--------. 1 root hadoop 156 Feb 14 12:13 container-executor.cfg -rw-r--r--. 1 root root 3895 Feb 14 12:10 core-site.xml -rw-r--r--. 1 root root 617 Feb 14 12:10 hadoop-env.sh -rw-r--r--. 1 root root 2684 Feb 14 12:10 hdfs-site.xml -rw-r--r--. 1 root root 314 Feb 14 12:10 log4j.properties -rw-r--r--. 1 root root 5011 Feb 14 12:10 mapred-site.xml -rw-r--r--. 1 root root 315 Feb 14 12:10 ssl-client.xml -rw-r--r--. 1 root hadoop 684 Feb 14 12:13 topology.map -rwxr-xr-x. 1 root hadoop 1594 Feb 14 12:13 topology.py -rw-r--r--. 1 root root 3872 Feb 14 12:10 yarn-site.xml
And /etc/hadoop looks like:
[root@dn0 ~]# ll /etc/hadoop total 8 lrwxrwxrwx. 1 root root 29 Feb 14 12:10 conf -> /etc/alternatives/hadoop-conf drwxr-xr-x. 2 root root 4096 Feb 14 12:10 conf.cloudera.CD-HDFS-gbUrTxBt drwxr-xr-x. 2 root root 4096 Feb 14 12:13 conf.cloudera.CD-YARN-VAJUGMaj [root@dn0 ~]# ll /etc/alternatives/hadoop-conf lrwxrwxrwx. 1 root root 42 Feb 14 12:10 /etc/alternatives/hadoop-conf -> /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj
The yarn process runs as the yarn user I presume, so for some reason the wrong permissions are being given to container-executor.cfg.
Just out of interest, the contents are:
[root@dn0 ~]# cat /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg yarn.nodemanager.linux-container-executor.group=yarn min.user.id=1000 allowed.system.users=nobody,impala,hive,llama,hbase banned.users=hdfs,yarn,mapred,bin
When I look on our other cluster that doesn't use Kerberos and Cloudera Director, I see the following permissions:
[root@????? ~]# find / -name container-executor.cfg -exec ls -l {} \; -rw-r--r-- 1 root root 318 Jun 1 2016 /log/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/etc/hadoop/conf.empty/container-executor.cfg -r--r--r-- 1 root hadoop 0 Jan 23 05:37 /etc/hadoop/conf.cloudera.yarn/container-executor.cfg -r-------- 1 root hadoop 0 Jan 23 05:37 /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_593636475737667066/yarn-conf/container-executor.cfg -r-------- 1 root hadoop 0 Jan 23 05:06 /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_-6875379618642481202/yarn-conf/container-executor.cfg -r-------- 1 root hadoop 0 Jan 23 05:37 /var/run/cloudera-scm-agent/process/1056-yarn-NODEMANAGER/container-executor.cfg [root@????? ~]# ll /etc/hadoop total 8 lrwxrwxrwx 1 root root 29 Jan 31 08:29 conf -> /etc/alternatives/hadoop-conf drwxr-xr-x 2 root root 4096 Jan 23 05:37 conf.cloudera.hdfs drwxr-xr-x 2 root root 4096 Jan 31 08:29 conf.cloudera.yarn [root@????? ~]#
These look more reasonable.
Can anybody give me a clue how these permissions are getting (or not getting) set? Since this is Cloudera Director, its out of my control how they are being set.
Created 02-14-2017 01:29 PM
Created 02-14-2017 07:29 AM
I used Cloudera Director to build a cluster without Kerberos. Yarn came up okay and the permissions were the following:
[root@dn2 ~]# find / -name container-executor.cfg -exec ls -l {} \; -rw-r--r--. 13 cloudera-scm cloudera-scm 318 Jan 20 21:38 /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/etc/hadoop/conf.empty/container-executor.cfg -r--------. 1 root hadoop 0 Feb 14 15:16 /run/cloudera-scm-agent/process/44-yarn-NODEMANAGER/container-executor.cfg -r--------. 1 root hadoop 0 Feb 14 15:16 /etc/hadoop/conf.cloudera.CD-YARN-uMqvpvqg/container-executor.cfg
They are the same permissions, so it seems the permissions are not the issue. Perhaps it is the contents.
Any clues?
Created 02-14-2017 09:05 AM
Actually, digging into this a bit more I think it is the permissions on the container-executor.cfg that is causing the issue.
The nodemanager is launched as the yarn user:
yarn 17040 17035 0 16:53 ? 00:00:00 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/yarn/yarn.sh nodemanager
And from here http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/SecureContainer.html:
On Linux environment the secure container executor is the LinuxContainerExecutor. It uses an external program called the container-executor> to launch the container. This program has the setuid access right flag set which allows it to launch the container with the permissions of the YARN application user.
This would explain why this is just happening to secure clusters built by CD.
It seems that the container-executor.cfg is created and populated at nodemanager restart time, so I cannot change permissions on the cfg file to test.
Is there a reason why these cfg files are created with 400 and not 444? Should they be 444 on secure clusters? Can this be changed, and where?
Thanks
Created 02-14-2017 01:29 PM
Created on 02-15-2017 01:34 AM - edited 02-15-2017 01:35 AM
Hi,
Looking at that document, I see:
conf/container-executor.cfg The executable requires a configuration file called container-executor.cfg to be present in the configuration directory passed to the mvn target mentioned above. The configuration file must be owned by the user running NodeManager (user yarn in the above example), group-owned by anyone and should have the permissions 0400 or r--------.
This makes sense, as if the container-executor runs as yarn, how can it read the configuration?
Does anyone have a running kerberos cluster to confirm the permissions?
Created 02-15-2017 08:07 AM
Okay, I have it.
I was using the parcel_provisioner.sh script to preload the parcels into Docker images. However, when doing the pre-extraction the permissions on the container-executor weren't being set properly. For now, turning off the preextracting works however I'll test by manually setting the perms however I'm wondering how many other permissions aren't set properly.
FYI, the root:hadoop 400 permissions work because of the setuid flag on the container-executor binary. Now everything makes sense.
Thanks for the help!