Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

URGENT Cluster unavailable after upgrade to 5.16!

avatar
Expert Contributor

Hi,

I've just upgraded the cluster from 5.14 to 5.16, however none of the node managers will start.  They give the error:

2018-12-04 13:26:15,283 DEBUG org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: checkLinuxExecutorSetup: [/var/lib/yarn-ce/bin/container-executor, --checksetup]
2018-12-04 13:26:15,287 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 24
ExitCodeException exitCode=24: Invalid conf file provided : /var/lib/yarn-ce/etc/hadoop/container-executor.cfg

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
        at org.apache.hadoop.util.Shell.run(Shell.java:507)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)


I've read various articles about permissions etc, none of which seem to work.  The file /var/lib/yarn-ce/etc/hadoop/container-executor.cfg seems to be recreated every time I attempt to start the node manager.  I know there was some bug fixes in 5.15 and 5.16 releating to this.

Any help would be greatly apprichated as the cluster is currently down.

1 ACCEPTED SOLUTION

avatar
Explorer

 

Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab.

 

Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr).

 

This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have.

 

Matt

 

View solution in original post

1 REPLY 1

avatar
Explorer

 

Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab.

 

Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr).

 

This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have.

 

Matt