Reply
Contributor
Posts: 48
Registered: ‎04-26-2017
Accepted Solution

URGENT Cluster unavailable after upgrade to 5.16!

Hi,

I've just upgraded the cluster from 5.14 to 5.16, however none of the node managers will start.  They give the error:

2018-12-04 13:26:15,283 DEBUG org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: checkLinuxExecutorSetup: [/var/lib/yarn-ce/bin/container-executor, --checksetup]
2018-12-04 13:26:15,287 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 24
ExitCodeException exitCode=24: Invalid conf file provided : /var/lib/yarn-ce/etc/hadoop/container-executor.cfg

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
        at org.apache.hadoop.util.Shell.run(Shell.java:507)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)


I've read various articles about permissions etc, none of which seem to work.  The file /var/lib/yarn-ce/etc/hadoop/container-executor.cfg seems to be recreated every time I attempt to start the node manager.  I know there was some bug fixes in 5.15 and 5.16 releating to this.

Any help would be greatly apprichated as the cluster is currently down.

Explorer
Posts: 6
Registered: ‎04-26-2018

Re: URGENT Cluster unavailable after upgrade to 5.16!

 

Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab.

 

Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr).

 

This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have.

 

Matt

 

Announcements