- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
YARN NodeManagers failed to start with permission issue after Kerberizaton in Cloudera Runtime 7.0.3
Created on
‎02-28-2020
06:13 AM
- last edited on
‎02-28-2020
06:34 AM
by
cjervis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Community,
After a successful installion of Cloudera Runtime 7.0.3, we tried to do a Kerberization process. (We did the same before with 5.14)
Everything went fine with the Kerberos wizard, but after in the config deployment phase, the YARN NodeManagers failed to start with the following error:
Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:394) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) Caused by: java.io.IOException: Linux container executor not configured properly (error=-1) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:307) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:392) ... 3 more Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:300) ... 4 more Caused by: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.util.Shell.runCommand(Shell.java:938) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) ... 6 more Caused by: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 10 more
We tried to search the "answer" on the internet. We found that we should add the yarn user in the hadoop group.
[root@dwh20-i-cdhwt01 ~]# id yarn
uid=485(yarn) gid=984(yarn) groups=984(yarn),988(hadoop)
Nothing happened, the NodeManagers didn't start.
The container-executor script permission:
[root@dwh20-i-cdhwt01 ~]# ll /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 root yarn 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor
(Just for information, we didn't find container-executor config file on the NodeManagers.)
Please somebody help us. 🙂
Thanks.
Regards,
Gabor
Created ‎02-28-2020 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Dombai_Gabor ,
One possible cause of this issue is that the volume is mounted with "noexec". Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.
Ben
Created ‎02-28-2020 08:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share the result of
ls -nl /var/lib/yarn-ce/bin/container-executor
This is to make sure the uid of yarn matches the uid on the container-executor
Created ‎02-28-2020 09:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes! Here you are. 🙂
[root@dwh20-i-cdhwt01 ~]# ls -nl /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 0 984 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor
Created ‎02-28-2020 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Dombai_Gabor ,
One possible cause of this issue is that the volume is mounted with "noexec". Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.
Ben
Created ‎02-28-2020 11:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I checked it. Yes, the /var mount point has a noexec parameter. I turned off the cloudera services, edited the fstab, rebooted the hosts, but they dont boot now. 😞
Created ‎02-28-2020 04:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry to hear that... I think you mean that the OS won't boot; if so, let us know what happens and perhaps we can help. I'm not too familiar with debugging tactics of OS boot off hand, but others might be able to provide some insight.
Created ‎02-29-2020 12:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok guys, there are no problems no. 🙂 There was a typo in our fstab file (missing s from the defaults option).
On the other hand, there was a noexec option on the /var mountpoint. After I've removed it, it is working now finally.
The NodeManagers are working properly.
Thanks guys. 😄
Created ‎02-28-2020 06:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adding on top of Ben's suggestion. In this link you can find additional info on requirements for container-executor mount
https://docs.cloudera.com/runtime/7.0.3/yarn-security/topics/yarn-linux-container-executor.html
Quote from Doc:
make sure the mount point for the parcel directory is without the nosuid option.
The container-executor program must have a very specific set of permissions and ownership to function correctly. In particular, it must:
- Be owned by root.
- Be owned by a group that contains only the user running the YARN daemons.
- Be setuid.
- Be group readable and executable. This corresponds to the ownership root:yarn and the permissions 6050
Created ‎02-02-2021 02:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will first need to make sure the file group is set to yarn:
1. # chgrp yarn container-executor
then set the file to:
# chmod 6050 container-executor
2. Do ls -l to see the permission is set to:
---Sr-s--- 1
3. to check the acl run the following:
getfacl container-executor
Created on ‎09-09-2020 02:43 AM - edited ‎09-09-2020 02:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I resolved this problem by running the 'usermod -G yarn yarn' command.
