Created on 01-24-2019 01:07 AM - edited 09-16-2022 07:05 AM
Execute command Start this NodeManager on role NodeManager
Failed to start role
Supervisor returned FATAL. Please check the role log file, stderr, or stdout.
Environment details:
Version: Cloudera Express 5.15.0
Java VM Name: Java HotSpot(TM) 64-Bit Server VM
Java VM Vendor: Oracle Corporation
Java Version: 1.7.0_67
System details:
Linux optim-rhel72-uppu.development.unicomglobal.software 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
I have followed the steps under "Configuring TLS/SSL for HDFS, YARN and MapReduce" using the link https://www.cloudera.com/documentation/enterprise/5-15-x/topics/sg_hive_encryption.html
Service did not start successfully; not all of the required roles started: only 0/1 roles started. Reasons : Service has only 0 NodeManager roles running instead of minimum required 1
I see below error in the role log:
Created 02-11-2019 02:53 PM
Hi @Tulasi,
Sorry for my late reply. From the output you sent below:
[root@optim-rhel72-uppu ~]# id yarn
uid=1007(yarn) gid=1010(hadoop) groups=1010(hadoop)
This looks a little different than my test cluster. Can you please do this?
usermod -g yarn yarn usermod -a -G hadoop yarn
Also, please paste the content of this file:
/opt/cloudera/parcels/CDH/meta/permissions.json
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-28-2019 10:33 AM
Created 01-28-2019 10:43 PM
Hi Jerry,
Here is what I have on my system:
/etc/hadoop/conf.cloudera.yarn
-r-------- 1 root hadoop 156 Jan 24 01:00 container-executor.cfg
-rw-r--r-- 1 root root 3894 Jan 17 22:56 core-site.xml
-rw-r--r-- 1 root root 617 Jan 17 22:56 hadoop-env.sh
-rw-r--r-- 1 root root 2729 Jan 17 22:56 hdfs-site.xml
Even if I change above file permission, after start, it changes back to the same permission.
From manager I have this
Container Executor Group = yarn
Upgrade also not allowing as it requires all services should be up and running.
Let me know if you need any more details.
Thanks,
Tulasi
Created 01-29-2019 06:47 AM
Hi Tulasi,
Could you check the value of this property Container Executor Group from the file "container-executor.cfg" file and cross check with CM configuration
Thanks
Jerry
Created 01-29-2019 09:20 AM
Created 01-29-2019 06:59 PM
Hi @Tulasi,
Could you please send us the output of below command on all the NodeManager hosts?
ls -alt /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
The correct permission should be like this:
---Sr-s--- 1 root yarn 53728 Jan 28 14:03 /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
If it looks different, you can perform the following steps on all NodeManagers:
chmod 6050 /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor chgrp yarn /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 01-29-2019 08:36 PM
Hi Li,
I changed the permission, still it didn't fix the problem.
[root@optim-rhel72-uppu bin]# ls -alt /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
---Sr-s--- 1 root yarn 53712 May 24 2018 /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor
In this below, is it something to do with banned.users?
[root@optim-rhel72-uppu conf.cloudera.yarn]# cat /etc/hadoop/conf.cloudera.yarn/container-executor.cfg
yarn.nodemanager.linux-container-executor.group=yarn
min.user.id=1000
allowed.system.users=nobody,impala,hive,llama,hbase
banned.users=hdfs,yarn,mapred,bin
Thanks,
Tulasi
Created 01-30-2019 10:24 AM
Hi @Tulasi,
The banned.users property is to prevent jobs from being submitted using those user accounts. It should not cause NodeManager not able to start problem.
I suggest you checking these doc links:
and
How many nodes does your cluster have? Have you checked all the permissions?
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-30-2019 11:51 PM
Hi Li,
Everything on a single node.
/opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin
[root@optim-rhel72-uppu bin]# ls -lrt
total 80
-rwxr-xr-x 1 root root 12476 May 24 2018 yarn
-rwxr-xr-x 1 root root 5463 May 24 2018 mapred
---Sr-s--- 1 root yarn 53712 May 24 2018 container-executor
This is the error from /var/log/hadoop-yarn/hadoop-cmf-yarn-NODEMANAGER-optim-rhel72-uppu.development.unicomglobal.software.log.out
2019-01-30 23:42:45,872 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:269)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: java.io.IOException: Cannot run program "/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop-yarn/bin/container-executor": error=13, Permission denied
Thanks,
Tulasi
Created 01-31-2019 01:19 PM
Hi @Tulasi,
Could you please run below command and send us the output?
id yarn
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-31-2019 10:17 PM
Hi Li,
This is what I am getting:
[root@optim-rhel72-uppu ~]# id yarn
uid=1007(yarn) gid=1010(hadoop) groups=1010(hadoop)
Thanks,
Tulasi
Created 02-06-2019 03:49 AM
Is nosuid set on the mount point? I had a simialr issue documented here: http://community.cloudera.com/t5/Cloudera-Manager-Installation/URGENT-Cluster-unavailable-after-upgr...
Created 02-07-2019 03:24 AM
This is the content of my /etc/fstab file
-----------------------------------
/dev/mapper/rhel_rhel72-root / xfs defaults 0 0
UUID=d762b842-5c87-4e4d-bc0e-7a6bad357604 /boot xfs defaults 0 0
/dev/mapper/rhel_rhel72-home /home xfs defaults 0 0
/dev/mapper/rhel_rhel72-swap swap swap defaults 0 0
-----------------------------------
Do I need to change anything?
Thanks.
Created 02-11-2019 02:53 PM
Hi @Tulasi,
Sorry for my late reply. From the output you sent below:
[root@optim-rhel72-uppu ~]# id yarn
uid=1007(yarn) gid=1010(hadoop) groups=1010(hadoop)
This looks a little different than my test cluster. Can you please do this?
usermod -g yarn yarn usermod -a -G hadoop yarn
Also, please paste the content of this file:
/opt/cloudera/parcels/CDH/meta/permissions.json
Thanks,
Li
Li Wang, Technical Solution Manager
Created 02-12-2019 12:37 AM
Hi Li,
Thanks for being on top of it and helping me in solving the problem.
usermod -g yarn yarn
usermod -a -G hadoop yarn
Above two commands fixed my problem.
[root@optim-rhel72-uppu meta]# id yarn
uid=1007(yarn) gid=1008(yarn) groups=1008(yarn),1010(hadoop)
I have no idea how yarn user permissions are changed, all that I am following is that what have been suggested in cloudera instructions to enable encryption.
Thanks to all of the folks for providing suggestions.
Problems like this sucks lot of time in identifying where to fix and I would request cloudera to improve such situations.
Thanks,
Tulasi
Created 02-12-2019 10:03 AM
Hi @Tulasi,
Greate to hear the issue got resolved! I will report internally on this to our documentation team to see how we can improve on it.
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-29-2019 04:48 AM
Hi @Tulasi,
Try to kill running process on port used by Yarn Services and then Try to restart.
Regards,
Manu.
Created 01-29-2019 09:18 AM