Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Resource manager failed to start

Highlighted

Resource manager failed to start

Explorer

Resource manager goes down after restart and following is the log. The configas all look correct to me from all the suggestions to similar issues.

2016-07-22 18:09:17,419 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server hadoop1.dev.com/192.168.xx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-22 18:09:17,420 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to hadoop1.dev.com/192.168.xx.xxx:2181, initiating session 2016-07-22 18:09:17,524 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed 2016-07-22 18:09:17,524 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=hadoop1.dev.com:2181 sessionTimeout=10000 watcher=null 2016-07-22 18:09:17,524 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down 2016-07-22 18:09:17,528 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server hadoop1.dev.com/192.168.xx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-22 18:09:17,529 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to hadoop1.dev.com/192.168.xx.xxx:2181, initiating session 2016-07-22 18:09:17,529 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:createConnection(1205)) - Created new ZK connection 2016-07-22 18:09:17,536 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-22 18:09:17,639 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1164)) - Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:309) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:305) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1125) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1146) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createRootDir(ZKRMStateStore.java:305) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:288) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:657) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:586) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1019) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1060) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1056) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1056) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1096) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1231)

Could someone please help me solve the issue.

Thanks, Sree

3 REPLIES 3

Re: Resource manager failed to start

New Contributor

This could be because of an issue with the Zookeeper. You can try 2 things:

Restart ZK followed by YARN

If above doesn't fix the issue, clear the /rmstore znode and restart YARN. Hope this helps

Re: Resource manager failed to start

New Contributor

@Sanjeev Can you tell me how to clear the /rmstore znode and restart YARN?

Re: Resource manager failed to start

New Contributor

1) Stop Resource Manager

2) Connect with ZK server (eg. $ bin/zkCli.sh -server 127.0.0.1:2181)

3) Remove the znode for RM -- rmr /rmstore

Don't have an account?
Coming from Hortonworks? Activate your account here