Support Questions

Find answers, ask questions, and share your expertise

Unable to start YARN - NodeManager on role NodeManager

avatar
Explorer

 

Execute command Start this NodeManager on role NodeManager 

Failed to start role

Supervisor returned FATAL. Please check the role log file, stderr, or stdout.

 

Environment details:

 

Version: Cloudera Express 5.15.0
Java VM Name: Java HotSpot(TM) 64-Bit Server VM
Java VM Vendor: Oracle Corporation
Java Version: 1.7.0_67
 
System details:
Linux optim-rhel72-uppu.development.unicomglobal.software 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

 

I have followed the steps under "Configuring TLS/SSL for HDFS, YARN and MapReduce" using the link https://www.cloudera.com/documentation/enterprise/5-15-x/topics/sg_hive_encryption.html
 
Service did not start successfully; not all of the required roles started: only 0/1 roles started. Reasons : Service has only 0 NodeManager roles running instead of minimum required 1

I see below error in the role log:

 

Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
 at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
 at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:199)
 at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267)
 ... 3 more
Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
 at org.apache.hadoop.util.Shell.run(Shell.java:507)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
 ... 4 more
 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at optim-rhel72-uppu.development.unicomglobal.software/10.1.72.3
************************************************************/
 
Any help highly appreciated.
 
Thanks,
Tulasi
 

 

 

 

1 ACCEPTED SOLUTION

avatar
Guru

Hi @Tulasi,

 

Sorry for my late reply. From the output you sent below:

 

[root@optim-rhel72-uppu ~]# id yarn
uid=1007(yarn) gid=1010(hadoop) groups=1010(hadoop)

 

This looks a little different than my test cluster. Can you please do this?

usermod -g yarn yarn
usermod -a -G hadoop yarn

 

Also, please paste the content of this file:

/opt/cloudera/parcels/CDH/meta/permissions.json

 

Thanks,

Li 

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

View solution in original post

17 REPLIES 17

avatar
Expert Contributor
Hi Tulasi,

Could you please verify the "container-executor.group" are same on both from Cloudera manager (Yarn->Configuration->Container Executor Group) and /etc/hadoop/conf.cloudera.yarn/container-executor.cfg (from Node manager host)

Let us know if you have questions

Thanks
Jerry

avatar
Explorer

Hi Jerry,

 

Here is what I have on my system:

 

/etc/hadoop/conf.cloudera.yarn

-r-------- 1 root hadoop  156 Jan 24 01:00 container-executor.cfg
-rw-r--r-- 1 root root   3894 Jan 17 22:56 core-site.xml
-rw-r--r-- 1 root root    617 Jan 17 22:56 hadoop-env.sh
-rw-r--r-- 1 root root   2729 Jan 17 22:56 hdfs-site.xml

 

Even if I change above file permission, after start, it changes back to the same permission.

 

From manager I have this

Container Executor Group = yarn

 

Upgrade also not allowing as it requires all services should be up and running.

 

Let me know if you need any more details.

 

Thanks,

Tulasi

 

avatar
Expert Contributor

Hi Tulasi,

 

Could you check the value of this property Container Executor Group from the file "container-executor.cfg" file and cross check with CM configuration

 

Thanks

Jerry

avatar
Explorer
Hi Jerry, Yes, the value of "Container Executor Group" property is matching with CM, see below: [root@optim-rhel72-uppu conf.cloudera.yarn]# cat container-executor.cfg yarn.nodemanager.linux-container-executor.group=yarn min.user.id=1000 allowed.system.users=nobody,impala,hive,llama,hbase banned.users=hdfs,yarn,mapred,bin Thanks, Tulasi

avatar
Guru

Hi @Tulasi

 

Could you please send us the output of below command on all the NodeManager hosts?

 

ls -alt /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor

 

The correct permission should be like this:

---Sr-s--- 1 root yarn 53728 Jan 28 14:03 /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor

 

If it looks different, you can perform the following steps on all NodeManagers:

chmod 6050  /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
chgrp yarn /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor

Thanks and hope this helps,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Explorer

Hi Li,

 

I changed the permission, still it didn't fix the problem.

 

[root@optim-rhel72-uppu bin]# ls -alt /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
---Sr-s--- 1 root yarn 53712 May 24  2018 /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor

 

In this below, is it something to do with banned.users? 

 

[root@optim-rhel72-uppu conf.cloudera.yarn]# cat /etc/hadoop/conf.cloudera.yarn/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=yarn
min.user.id=1000
allowed.system.users=nobody,impala,hive,llama,hbase
banned.users=hdfs,yarn,mapred,bin

 

Thanks,

Tulasi

avatar
Guru

Hi @Tulasi,

 

The banned.users property is to prevent jobs from being submitted using those user accounts. It should not cause NodeManager not able to start problem.

 

I suggest you checking these doc links:

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cdh_sg_other_hadoop_security.html#to...

and

https://www.cloudera.com/documentation/enterprise/5-15-x/topics/cdh_sg_yarn_container_exec_errors.ht...

 

How many nodes does your cluster have? Have you checked all the permissions?

 

Thanks,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

avatar
Explorer

Hi Li,

 

Everything on a single node.

 

/opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin
[root@optim-rhel72-uppu bin]# ls -lrt
total 80
-rwxr-xr-x 1 root root 12476 May 24  2018 yarn
-rwxr-xr-x 1 root root  5463 May 24  2018 mapred
---Sr-s--- 1 root yarn 53712 May 24  2018 container-executor

 

This is the error from /var/log/hadoop-yarn/hadoop-cmf-yarn-NODEMANAGER-optim-rhel72-uppu.development.unicomglobal.software.log.out

 

2019-01-30 23:42:45,872 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: java.io.IOException: Cannot run program "/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn/bin/container-executor": error=13, Permission denied

 

Thanks,

Tulasi

avatar
Guru

Hi @Tulasi,

 

Could you please run below command and send us the output?

id yarn

 

Thanks,

Li

Li Wang, Technical Solution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum