Support Questions

Find answers, ask questions, and share your expertise

ResourceManager intermittently fails with error: $ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore

avatar
Explorer

Hi,

Resource manager fails intermittently and following error message is displayed:

2016-09-28 16:42:20,466 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1211)) - Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:326) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:322) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1158) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1191) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createRootDir(ZKRMStateStore.java:322) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createRootDirRecursively(ZKRMStateStore.java:1295) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:303) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:563) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:956) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:997) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:993) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:993) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1033) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1177) 2016-09-28 16:42:20,466 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1214)) - Retrying operation on ZK. Retry no. 666 2016-09-28 16:42:20,466 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server VORA1.ad.company.com/10.78.1.240:2181. Will not attempt to authenticate using SASL (unknown error) 2016-09-28 16:42:20,467 WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

1 ACCEPTED SOLUTION

avatar
Super Collaborator
2 REPLIES 2

avatar
Super Collaborator

Hi Jyothi, please refer this link hope this will be helpful for you.

http://stackoverflow.com/questions/17533661/zookeeper-error-connection-loss-exception

avatar
Explorer

Thanks it worked 🙂