Support Questions

Find answers, ask questions, and share your expertise

Resource Manager down - /rmstore error

avatar
Rising Star

Resource manager is down due to below error :

2017-03-28 13:32:05,609 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1229)) - Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore

I saw something earlier about formatting/removing /rmstore - but can't locate that now.

Appreciate if somebody could provide a detailed answer to fix this issue.

8 REPLIES 8

avatar
Guru

@n c, This error is related to Zookeeper connection. can you please make sure if zookeeper is up and running fine ?

You can also try restarting Zookeepers and RM to check if this issue is resolved.

avatar
Rising Star

Yes, already restarted zookeeper and resource manager. did not help. thanks.

avatar
Guru

Find the below link which shows how to clear /rmstore znode

https://community.hortonworks.com/questions/46703/resource-manager-failed-to-start.html

avatar
Contributor

Firstly, verify that the Zookeeper ensemble is up. Zookeeper daemon being up and running does not mean there is a "ensemble".

Can you connect to zookeeper ?

zkCli.sh -server localhost:2181 (Change to the address where it runs)

[zk: localhost:2181(CONNECTED) 0] ls /

Will list all znodes, can you see "rmstore" there ?

you can delete it by rmr /rmstore

Restart zookeeper and RM

avatar
Rising Star

When I run zkCli.sh -server <hostname>:2181

I get repeated :

2017-03-31 13:42:05,992 - INFO [main-SendThread(hdtesting2.com.local:2181):ClientCnxn$SendThread@1142] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect

avatar
Contributor

So, it means that the zookeeper ensemble is not up. How many nodes you have in the zookeeper ? make sure the server.ip mapping and myid matches. paste your "zoo.cfg" here

and

netstat -tlpn | grep 2181

avatar
Rising Star

Thanks for the response.

In Ambari -> Zookeeper : I can see three entries for zookeeper server and 2 zookeeper clients installed.

Not sure about server ip mapping and myid match - can you give more detail pls?

[root@hdtesting1 etc]# netstat -tlpn | grep 2181

tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 11297/java

zoo.cfg :

maxClientCnxns=50

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

dataDir=/var/lib/zookeeper

# the port at which the clients will connect

clientPort=2181

avatar
Guru

@n c, There can be multiple reasons for this issue.

1) Make sure that you have odd number of zookeepers. (example : 3)

2) Also make sure that the ports on which zookeepers are listening are open and used by Zks.

3) Check the firewall setting between hosts to make sure they can communicate with each other.

This is also a good read.

http://stackoverflow.com/questions/13316776/zookeeper-connection-error