Created 03-30-2017 03:10 PM
Resource manager is down due to below error :
2017-03-28 13:32:05,609 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1229)) - Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore
I saw something earlier about formatting/removing /rmstore - but can't locate that now.
Appreciate if somebody could provide a detailed answer to fix this issue.
Created 03-30-2017 06:16 PM
@n c, This error is related to Zookeeper connection. can you please make sure if zookeeper is up and running fine ?
You can also try restarting Zookeepers and RM to check if this issue is resolved.
Created 03-30-2017 08:21 PM
Yes, already restarted zookeeper and resource manager. did not help. thanks.
Created 03-30-2017 08:58 PM
Find the below link which shows how to clear /rmstore znode
https://community.hortonworks.com/questions/46703/resource-manager-failed-to-start.html
Created 03-31-2017 07:00 AM
Firstly, verify that the Zookeeper ensemble is up. Zookeeper daemon being up and running does not mean there is a "ensemble".
Can you connect to zookeeper ?
zkCli.sh -server localhost:2181 (Change to the address where it runs)
[zk: localhost:2181(CONNECTED) 0] ls /
Will list all znodes, can you see "rmstore" there ?
you can delete it by rmr /rmstore
Restart zookeeper and RM
Created 03-31-2017 05:43 PM
When I run zkCli.sh -server <hostname>:2181
I get repeated :
2017-03-31 13:42:05,992 - INFO [main-SendThread(hdtesting2.com.local:2181):ClientCnxn$SendThread@1142] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
Created 04-01-2017 09:34 AM
So, it means that the zookeeper ensemble is not up. How many nodes you have in the zookeeper ? make sure the server.ip mapping and myid matches. paste your "zoo.cfg" here
and
netstat -tlpn | grep 2181
Created 04-03-2017 01:12 PM
Thanks for the response.
In Ambari -> Zookeeper : I can see three entries for zookeeper server and 2 zookeeper clients installed.
Not sure about server ip mapping and myid match - can you give more detail pls?
[root@hdtesting1 etc]# netstat -tlpn | grep 2181
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 11297/java
zoo.cfg :
maxClientCnxns=50
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181
Created 03-31-2017 08:24 PM
@n c, There can be multiple reasons for this issue.
1) Make sure that you have odd number of zookeepers. (example : 3)
2) Also make sure that the ports on which zookeepers are listening are open and used by Zks.
3) Check the firewall setting between hosts to make sure they can communicate with each other.
This is also a good read.
http://stackoverflow.com/questions/13316776/zookeeper-connection-error