Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Resource manager failed to start

Highlighted

Resource manager failed to start

Explorer

Resource manager goes down after restart and following is the log. The configas all look correct to me from all the suggestions to similar issues.

2016-07-22 18:09:17,419 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server hadoop1.dev.com/192.168.xx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-22 18:09:17,420 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to hadoop1.dev.com/192.168.xx.xxx:2181, initiating session 2016-07-22 18:09:17,524 INFO zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x0 closed 2016-07-22 18:09:17,524 INFO zookeeper.ZooKeeper (ZooKeeper.java:<init>(438)) - Initiating client connection, connectString=hadoop1.dev.com:2181 sessionTimeout=10000 watcher=null 2016-07-22 18:09:17,524 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down 2016-07-22 18:09:17,528 INFO zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket connection to server hadoop1.dev.com/192.168.xx.xxx:2181. Will not attempt to authenticate using SASL (unknown error) 2016-07-22 18:09:17,529 INFO zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection established to hadoop1.dev.com/192.168.xx.xxx:2181, initiating session 2016-07-22 18:09:17,529 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:createConnection(1205)) - Created new ZK connection 2016-07-22 18:09:17,536 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1098)) - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect 2016-07-22 18:09:17,639 INFO recovery.ZKRMStateStore (ZKRMStateStore.java:runWithRetries(1164)) - Exception while executing a ZK operation. org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:309) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$1.run(ZKRMStateStore.java:305) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1125) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1146) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createRootDir(ZKRMStateStore.java:305) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.startInternal(ZKRMStateStore.java:288) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStart(RMStateStore.java:657) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:586) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1019) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1060) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1056) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1056) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1096) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1231)

Could someone please help me solve the issue. Thanks, Sree

2 REPLIES 2

Re: Resource manager failed to start

logs suggests that you are having a problem with zookeeper, check whether zookeeper is having any issue.

Re: Resource manager failed to start

Explorer

Hi Rajkumar,

Following is error in zookeeper logs

2016-07-22 19:06:03,579 - WARN [WorkerSender[myid=1]:QuorumCnxManager@383] - Cannot open channel to 3 at election address hadoop3.dev.com:3888 java.net.UnknownHostException: hadoop3.dev.com at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:745) 2016-07-22 19:06:03,580 - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x15d00000009 (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x15e (n.peerEpoch) LOOKING (my state) 2016-07-22 19:06:32,332 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.xx.xxx:50735 2016-07-22 19:06:32,332 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) 2016-07-22 19:06:32,332 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /192.168.xx.xxx:50735 (no session established for client)

Don't have an account?
Coming from Hortonworks? Activate your account here