Created on 12-04-2018 05:42 AM - edited 09-16-2022 06:57 AM
Hi,
I've just upgraded the cluster from 5.14 to 5.16, however none of the node managers will start. They give the error:
2018-12-04 13:26:15,283 DEBUG org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: checkLinuxExecutorSetup: [/var/lib/yarn-ce/bin/container-executor, --checksetup] 2018-12-04 13:26:15,287 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container executor initialization is : 24 ExitCodeException exitCode=24: Invalid conf file provided : /var/lib/yarn-ce/etc/hadoop/container-executor.cfg at org.apache.hadoop.util.Shell.runCommand(Shell.java:604) at org.apache.hadoop.util.Shell.run(Shell.java:507) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:267) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:562) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
I've read various articles about permissions etc, none of which seem to work. The file /var/lib/yarn-ce/etc/hadoop/container-executor.cfg seems to be recreated every time I attempt to start the node manager. I know there was some bug fixes in 5.15 and 5.16 releating to this.
Any help would be greatly apprichated as the cluster is currently down.
Created 12-04-2018 06:27 AM
Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab.
Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr).
This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have.
Matt
Created 12-04-2018 06:27 AM
Make sure that the nosuid flag isn't set on the /var (or /var/lib) mount point in /etc/fstab.
Since this release the container-executor has now moved to /var/lib/yarn-ce, which for many users will be on a different mount that it was previously (perhaps /opt or /usr).
This should probably be in the release notes for v5.16, as it isn't clear that the default location of container-executor has moved, and potential implications this will have.
Matt