Support Questions
Find answers, ask questions, and share your expertise

YARN Node Managers failing to start after enabling docker

Cloudera Employee

I am receiving the following message from each node manager when attempting to start YARN after enabling docker. What is the root cause?

2018-11-02 18:28:50,974 INFO  recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:checkVersion(1662)) - Loaded NM state version info 1.2
2018-11-02 18:28:51,174 INFO  resources.ResourceHandlerModule (ResourceHandlerModule.java:initNetworkResourceHandler(182)) - Using traffic control bandwidth handler
2018-11-02 18:28:51,193 WARN  resources.CGroupsBlkioResourceHandlerImpl (CGroupsBlkioResourceHandlerImpl.java:checkDiskScheduler(101)) - Device vda does not use the CFQ scheduler; disk isolation using CGroups will not work on this partition.

2018-11-02 18:28:51,199 INFO  resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(410)) - Initializing mounted controller blkio at /sys/fs/cgroup/blkio/hadoop-yarn
2018-11-02 18:28:51,199 INFO  resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(420)) - Yarn control group does not exist. Creating /sys/fs/cgroup/blkio/hadoop-yarn
2018-11-02 18:28:51,200 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems! 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:blkio Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/blkio/hadoop-yarn 
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsBlkioResourceHandlerImpl.bootstrap(CGroupsBlkioResourceHandlerImpl.java:123)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2018-11-02 18:28:51,205 INFO  service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	... 3 more
2018-11-02 18:28:51,207 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
	... 3 more

1 ACCEPTED SOLUTION

Cloudera Employee

Turns out, the guide that I was following was outdated. I tried this again on a different cluster and it worked perfectly with the Ambari default for yarn.nodemanager.linux-container-executor.cgroups.mount-path ("/cgroup")

View solution in original post

2 REPLIES 2

Cloudera Employee

I was able to work around this error by running:

sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn

I then received a very similar message for "/sys/fs/cgroup/memory/hadoop-yarn" and "/sys/fs/cgroup/cpu/hadoop-yarn". After creating these directories as well, the node managers came up.

Here is the full work-around that was run on each node:

sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn
sudo mkdir /sys/fs/cgroup/memory/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/memory/hadoop-yarn
sudo mkdir /sys/fs/cgroup/cpu/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/cpu/hadoop-yarn

Cloudera Employee

Turns out, the guide that I was following was outdated. I tried this again on a different cluster and it worked perfectly with the Ambari default for yarn.nodemanager.linux-container-executor.cgroups.mount-path ("/cgroup")

; ;