Created on 02-28-2020 06:13 AM - last edited on 02-28-2020 06:34 AM by cjervis
Dear Community,
After a successful installion of Cloudera Runtime 7.0.3, we tried to do a Kerberization process. (We did the same before with 5.14)
Everything went fine with the Kerberos wizard, but after in the config deployment phase, the YARN NodeManagers failed to start with the following error:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:394) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) Caused by: java.io.IOException: Linux container executor not configured properly (error=-1) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:307) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:392) ... 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:300) ... 4 more Caused by: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.util.Shell.runCommand(Shell.java:938) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 6 more Caused by: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 10 more
We tried to search the "answer" on the internet. We found that we should add the yarn user in the hadoop group.
[root@dwh20-i-cdhwt01 ~]# id yarn
uid=485(yarn) gid=984(yarn) groups=984(yarn),988(hadoop)
Nothing happened, the NodeManagers didn't start.
The container-executor script permission:
[root@dwh20-i-cdhwt01 ~]# ll /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 root yarn 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor
(Just for information, we didn't find container-executor config file on the NodeManagers.)
Please somebody help us. 🙂
Thanks.
Regards,
Gabor
Created 02-28-2020 10:43 AM
Hi @Dombai_Gabor ,
One possible cause of this issue is that the volume is mounted with "noexec". Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.
Ben
Created 02-28-2020 08:00 AM
Can you share the result of
ls -nl /var/lib/yarn-ce/bin/container-executor
This is to make sure the uid of yarn matches the uid on the container-executor
Created 02-28-2020 09:33 AM
Yes! Here you are. 🙂
[root@dwh20-i-cdhwt01 ~]# ls -nl /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 0 984 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor
Created 02-28-2020 10:43 AM
Hi @Dombai_Gabor ,
One possible cause of this issue is that the volume is mounted with "noexec". Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.
Ben
Created 02-28-2020 11:07 AM
I checked it. Yes, the /var mount point has a noexec parameter. I turned off the cloudera services, edited the fstab, rebooted the hosts, but they dont boot now. 😞
Created 02-28-2020 04:38 PM
I'm sorry to hear that... I think you mean that the OS won't boot; if so, let us know what happens and perhaps we can help. I'm not too familiar with debugging tactics of OS boot off hand, but others might be able to provide some insight.
Created 02-29-2020 12:58 AM
Ok guys, there are no problems no. 🙂 There was a typo in our fstab file (missing s from the defaults option).
On the other hand, there was a noexec option on the /var mountpoint. After I've removed it, it is working now finally.
The NodeManagers are working properly.
Thanks guys. 😄
Created 02-28-2020 06:31 PM
Adding on top of Ben's suggestion. In this link you can find additional info on requirements for container-executor mount
https://docs.cloudera.com/runtime/7.0.3/yarn-security/topics/yarn-linux-container-executor.html
Quote from Doc:
make sure the mount point for the parcel directory is without the nosuid option.
The container-executor program must have a very specific set of permissions and ownership to function correctly. In particular, it must:
Created 02-02-2021 02:33 PM
You will first need to make sure the file group is set to yarn:
1. # chgrp yarn container-executor
then set the file to:
# chmod 6050 container-executor
2. Do ls -l to see the permission is set to:
---Sr-s--- 1
3. to check the acl run the following:
getfacl container-executor
Created on 09-09-2020 02:43 AM - edited 09-09-2020 02:50 AM
I resolved this problem by running the 'usermod -G yarn yarn' command.