Support Questions

Find answers, ask questions, and share your expertise

Building kerberised cluster with Director: Yarn Nodemanager reports: invalid conf file provided

avatar
Contributor

Hi,

 

I am trying to build Kerberos-enabled clusters using Cloudera Director. During FirstRun pretty much all services come online except YARN. HDFS, Hue, Zookeeper and Kafka are all fine.

 

When bringing up the nodemanager I see the following in the role logs:

 

Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:251)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:198)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:249)
	... 3 more
Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg 

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192)
	... 4 more

 

Looking onto this node I see:

[root@dn0 ~]# find / -name container-executor.cfg -exec ls -l {} \;
-rw-r--r--. 13 cloudera-scm cloudera-scm 318 Jan 20 21:38 /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/etc/hadoop/conf.empty/container-executor.cfg
-r--------. 1 root hadoop 156 Feb 14 12:13 /run/cloudera-scm-agent/process/52-yarn-NODEMANAGER/container-executor.cfg
-r--------. 1 root hadoop 156 Feb 14 12:13 /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg
find: ‘/proc/17426’: No such file or directory
[root@dn0 ~]# ll /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/
total 52
-rw-r--r--. 1 root root     20 Feb 14 12:10 __cloudera_generation__
-rw-r--r--. 1 root root     67 Feb 14 12:10 __cloudera_metadata__
-r--------. 1 root hadoop  156 Feb 14 12:13 container-executor.cfg
-rw-r--r--. 1 root root   3895 Feb 14 12:10 core-site.xml
-rw-r--r--. 1 root root    617 Feb 14 12:10 hadoop-env.sh
-rw-r--r--. 1 root root   2684 Feb 14 12:10 hdfs-site.xml
-rw-r--r--. 1 root root    314 Feb 14 12:10 log4j.properties
-rw-r--r--. 1 root root   5011 Feb 14 12:10 mapred-site.xml
-rw-r--r--. 1 root root    315 Feb 14 12:10 ssl-client.xml
-rw-r--r--. 1 root hadoop  684 Feb 14 12:13 topology.map
-rwxr-xr-x. 1 root hadoop 1594 Feb 14 12:13 topology.py
-rw-r--r--. 1 root root   3872 Feb 14 12:10 yarn-site.xml

And /etc/hadoop looks like:

[root@dn0 ~]# ll /etc/hadoop
total 8
lrwxrwxrwx. 1 root root   29 Feb 14 12:10 conf -> /etc/alternatives/hadoop-conf
drwxr-xr-x. 2 root root 4096 Feb 14 12:10 conf.cloudera.CD-HDFS-gbUrTxBt
drwxr-xr-x. 2 root root 4096 Feb 14 12:13 conf.cloudera.CD-YARN-VAJUGMaj
[root@dn0 ~]# ll /etc/alternatives/hadoop-conf
lrwxrwxrwx. 1 root root 42 Feb 14 12:10 /etc/alternatives/hadoop-conf -> /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj

The yarn process runs as the yarn user I presume, so for some reason the wrong permissions are being given to container-executor.cfg.

 

Just out of interest, the contents are:

[root@dn0 ~]# cat /etc/hadoop/conf.cloudera.CD-YARN-VAJUGMaj/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=yarn
min.user.id=1000
allowed.system.users=nobody,impala,hive,llama,hbase
banned.users=hdfs,yarn,mapred,bin

 

When I look on our other cluster that doesn't use Kerberos and Cloudera Director, I see the following permissions:

[root@????? ~]# find / -name container-executor.cfg -exec ls -l {} \;
-rw-r--r-- 1 root root 318 Jun  1  2016 /log/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/etc/hadoop/conf.empty/container-executor.cfg
-r--r--r-- 1 root hadoop 0 Jan 23 05:37 /etc/hadoop/conf.cloudera.yarn/container-executor.cfg
-r-------- 1 root hadoop 0 Jan 23 05:37 /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_593636475737667066/yarn-conf/container-executor.cfg
-r-------- 1 root hadoop 0 Jan 23 05:06 /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_-6875379618642481202/yarn-conf/container-executor.cfg
-r-------- 1 root hadoop 0 Jan 23 05:37 /var/run/cloudera-scm-agent/process/1056-yarn-NODEMANAGER/container-executor.cfg
[root@????? ~]# ll /etc/hadoop
total 8
lrwxrwxrwx 1 root root   29 Jan 31 08:29 conf -> /etc/alternatives/hadoop-conf
drwxr-xr-x 2 root root 4096 Jan 23 05:37 conf.cloudera.hdfs
drwxr-xr-x 2 root root 4096 Jan 31 08:29 conf.cloudera.yarn
[root@????? ~]#

These look more reasonable.

 

Can anybody give me a clue how these permissions are getting (or not getting) set? Since this is Cloudera Director, its out of my control how they are being set.

 

1 ACCEPTED SOLUTION

avatar
Champion
The permissions on container-executor.cfg is correct. It should be 400 and root:hadoop.

Find and check the actual binary, container-executor. Also, review all of the configs. As a secured cluster switches from the default to the LinuxContainerExecutor.

https://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/SecureMode.html#LinuxContain...

View solution in original post

5 REPLIES 5

avatar
Contributor

I used Cloudera Director to build a cluster without Kerberos. Yarn came up okay and the permissions were the following:

[root@dn2 ~]# find / -name container-executor.cfg -exec ls -l {} \;
-rw-r--r--. 13 cloudera-scm cloudera-scm 318 Jan 20 21:38 /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/etc/hadoop/conf.empty/container-executor.cfg
-r--------. 1 root hadoop 0 Feb 14 15:16 /run/cloudera-scm-agent/process/44-yarn-NODEMANAGER/container-executor.cfg
-r--------. 1 root hadoop 0 Feb 14 15:16 /etc/hadoop/conf.cloudera.CD-YARN-uMqvpvqg/container-executor.cfg

They are the same permissions, so it seems the permissions are not the issue. Perhaps it is the contents.

Any clues?

avatar
Contributor

Actually, digging into this a bit more I think it is the permissions on the container-executor.cfg that is causing the issue.

 

The nodemanager is launched as the yarn user:

 

yarn     17040 17035  0 16:53 ?        00:00:00 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/yarn/yarn.sh nodemanager

And from here http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/SecureContainer.html:

 

On Linux environment the secure container executor is the LinuxContainerExecutor. It uses an external program called the container-executor> to launch the container. This program has the setuid access right flag set which allows it to launch the container with the permissions of the YARN application user.

 

This would explain why this is just happening to secure clusters built by CD.

 

It seems that the container-executor.cfg is created and populated at nodemanager restart time, so I cannot change permissions on the cfg file to test.

 

Is there a reason why these cfg files are created with 400 and not 444? Should they be 444 on secure clusters? Can this be changed, and where?

 

Thanks

avatar
Champion
The permissions on container-executor.cfg is correct. It should be 400 and root:hadoop.

Find and check the actual binary, container-executor. Also, review all of the configs. As a secured cluster switches from the default to the LinuxContainerExecutor.

https://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/SecureMode.html#LinuxContain...

avatar
Contributor

Hi,

 

Looking at that document, I see:

 

conf/container-executor.cfg
The executable requires a configuration file called container-executor.cfg to be present in the configuration directory passed to the mvn target mentioned above.

The configuration file must be owned by the user running NodeManager (user yarn in the above example), group-owned by anyone and should have the permissions 0400 or r--------.

This makes sense, as if the container-executor runs as yarn, how can it read the configuration?

 

Does anyone have a running kerberos cluster to confirm the permissions?

avatar
Contributor

Okay, I have it.

 

I was using the parcel_provisioner.sh script to preload the parcels into Docker images. However, when doing the pre-extraction the permissions on the container-executor weren't being set properly. For now, turning off the preextracting works however I'll test by manually setting the perms however I'm wondering how many other permissions aren't set properly.

 

FYI, the root:hadoop 400 permissions work because of the setuid flag on the container-executor binary. Now everything makes sense.

 

Thanks for the help!