Support Questions

Find answers, ask questions, and share your expertise

YARN NodeManagers failed to start with permission issue after Kerberizaton in Cloudera Runtime 7.0.3

avatar

Dear Community,

 

After a successful installion of Cloudera Runtime 7.0.3, we tried to do a Kerberization process. (We did the same before with 5.14)

Everything went fine with the Kerberos wizard, but after in the config deployment phase, the YARN NodeManagers failed to start with the following error:

 

Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:394)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
Caused by: java.io.IOException: Linux container executor not configured properly (error=-1)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:307)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:392)
	... 3 more
Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:183)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:206)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:300)
	... 4 more
Caused by: java.io.IOException: Cannot run program "/var/lib/yarn-ce/bin/container-executor": error=13, Permission denied
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:938)
	at org.apache.hadoop.util.Shell.run(Shell.java:901)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
	... 6 more
Caused by: java.io.IOException: error=13, Permission denied
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 10 more

 We tried to search the "answer" on the internet. We found that we should add the yarn user in the hadoop group.

 

[root@dwh20-i-cdhwt01 ~]# id yarn
uid=485(yarn) gid=984(yarn) groups=984(yarn),988(hadoop)

 

Nothing happened, the NodeManagers didn't start.

The container-executor script permission:

 

[root@dwh20-i-cdhwt01 ~]# ll /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 root yarn 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor

 

(Just for information, we didn't find container-executor config file on the NodeManagers.)

 

Please somebody help us. 🙂

Thanks.

 

Regards,

Gabor 

 

1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Dombai_Gabor ,

 

One possible cause of this issue is that the volume is mounted with "noexec".  Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.

 

Ben

View solution in original post

9 REPLIES 9

avatar

Can you share the result of 

ls -nl /var/lib/yarn-ce/bin/container-executor

This is to make sure the uid of yarn matches the uid on the container-executor

avatar

Yes! Here you are. 🙂

 

[root@dwh20-i-cdhwt01 ~]# ls -nl /var/lib/yarn-ce/bin/container-executor
---Sr-s--- 1 0 984 103968 Nov 19 12:34 /var/lib/yarn-ce/bin/container-executor

avatar
Master Guru

Hi @Dombai_Gabor ,

 

One possible cause of this issue is that the volume is mounted with "noexec".  Since your permissions and group membership seem correct, it is reasonable to check /etc/fstab to see if "noexec" is set where /var/ mounted.

 

Ben

avatar

I checked it. Yes, the /var mount point has a noexec parameter. I turned off the cloudera services, edited the fstab, rebooted the hosts, but they dont boot now. 😞 

avatar
Master Guru

@Dombai_Gabor,

 

I'm sorry to hear that... I think you mean that the OS won't boot; if so, let us know what happens and perhaps we can help.  I'm not too familiar with debugging tactics of OS boot off hand, but others might be able to provide some insight.

avatar

Ok guys, there are no problems no. 🙂 There was a typo in our fstab file (missing s from the defaults option).

 

On the other hand, there was a noexec option on the /var mountpoint. After I've removed it, it is working now finally.

The NodeManagers are working properly.

 

Thanks guys. 😄

avatar

Adding on top of Ben's suggestion. In this link you can find additional info on requirements for container-executor mount

https://docs.cloudera.com/runtime/7.0.3/yarn-security/topics/yarn-linux-container-executor.html

Quote from Doc:

make sure the mount point for the parcel directory is without the nosuid option.

The container-executor program must have a very specific set of permissions and ownership to function correctly. In particular, it must:

  1. Be owned by root.
  2. Be owned by a group that contains only the user running the YARN daemons.
  3. Be setuid.
  4. Be group readable and executable. This corresponds to the ownership root:yarn and the permissions 6050

 

 

avatar
New Contributor

You will first need to make sure the file group is set to yarn:

 

1. # chgrp yarn container-executor

then set the file to:

# chmod 6050 container-executor

 

2. Do ls -l to see the permission is set to: 

    ---Sr-s--- 1

3. to check the acl run the following:

 

getfacl container-executor

avatar
New Contributor

I resolved this problem by running the 'usermod -G yarn yarn' command.