Support Questions

Find answers, ask questions, and share your expertise

Zookeeper not running

avatar
Contributor

Apologies if this is basic question. But I seem to have a problem getting my Zookeeper to run.

I discovered the problem when I couldn't get a bunch of my services to run. Looking at the log files, there seems to be a recurring theme.

HiveServer2-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

2018-05-18 08:33:38,723 FATAL [main]: server.HiveServer2 (HiveServer2.java:addServerInstanceToZooKeeper(217)) - Unable to create HiveServer2 namespace: hiveserver2 on ZooKeeper

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

Yarn-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

2018-05-21 14:41:55,687 WARNavailability.MetricCollectorHAHelper (MetricCollectorHAHelper.java:findLiveCollectorHostsFromZNode(90)) - Unable to connect to zookeeper.

org.apache.hadoop.metrics2.sink.relocated.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ambari-metrics-cluster

Kafka-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

[2018-05-18 08:25:36,626] INFO shutting down (kafka.server.KafkaServer)

[2018-05-18 08:25:36,630] INFO shut down completed (kafka.server.KafkaServer)

[2018-05-18 08:25:36,630] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 25000

So, I went back to check Zookeeper on each on my machines and discovered this:

[mike_w_wong@slave1 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/hdp/current/zookeeper-server/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

From the docs, I tried to get ZK running:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/ref-94...

But I'm not having any luck.

Can anyone help??

Thanks!

1 ACCEPTED SOLUTION

avatar
@Mike Wong

1. Is it a new or an existing cluster ? How many total nodes you have in cluster ?

2. Plz provide us output of following command from all zookeeper server nodes

echo 'stat' | nc <ZK_HOST> 2181

Keeperexceptions could many times be due to large number of znode counts in zookeeper for various services. Also check zoo.cfg of all ZK nodes and verify if this file is identical across all nodes and hostnames for zk nodes referred are identical as well.

View solution in original post

11 REPLIES 11

avatar
Master Mentor

@Mike Wong

If your Ambari is up and running, try using the Ambari UI, See attached screenshot

Please revert


mikewong.jpg

avatar
Contributor

The really confusing part for me is Ambari shows ZK as running

74507-screen-shot-2018-05-21-at-113524-am.png

avatar
Contributor

@Geoffrey Shelton Okot

Running a ZK service check-Smoke test failed

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/ZOOKEEPER/3.4.5/package/scripts/service_check.py", line 73, in <module>
    ZookeeperServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/ZOOKEEPER/3.4.5/package/scripts/service_check.py", line 59, in service_check
    logoutput=True
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/var/lib/ambari-agent/tmp/zkSmoke.sh /usr/hdp/current/zookeeper-client/bin/zkCli.sh ambari-qa /usr/hdp/current/zookeeper-client/conf 2181 False kinit no_keytab no_principal /var/lib/ambari-agent/tmp/zkSmoke.out' returned 3. zk_node1=hdp.c.my-project-1519895027175.internal
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:708)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:703)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Running test on host hdp.c.my-project-1519895027175.internal
Connecting to hdp.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: hdp.c.my-project-1519895027175.internal:2181(CONNECTING) 0] get /zk_smoketest
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Connecting to hdp.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: hdp.c.my-project-1519895027175.internal:2181(CONNECTING) 0] ls /
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:737)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Data associated with znode /zk_smoketests is not consistent on host hdp.c.my-project-1519895027175.internal
Running test on host slave1.c.my-project-1519895027175.internal
Connecting to slave1.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: slave1.c.my-project-1519895027175.internal:2181(CONNECTING) 0] get /zk_smoketest
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Connecting to slave1.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: slave1.c.my-project-1519895027175.internal:2181(CONNECTING) 0] ls /
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:737)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Data associated with znode /zk_smoketests is not consistent on host slave1.c.my-project-1519895027175.internal
Running test on host slave2.c.my-project-1519895027175.internal
Connecting to slave2.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: slave2.c.my-project-1519895027175.internal:2181(CONNECTING) 0] get /zk_smoketest
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Connecting to slave2.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: slave2.c.my-project-1519895027175.internal:2181(CONNECTING) 0] ls /
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:737)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1184)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Data associated with znode /zk_smoketests is not consistent on host slave2.c.my-project-1519895027175.internal
Connecting to hdp.c.my-project-1519895027175.internal:2181
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: hdp.c.my-project-1519895027175.internal:2181(CONNECTING) 0] delete /zk_smoketest
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /zk_smoketest
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
	at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:708)
	at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:596)
	at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:368)
	at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:328)
	at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:287)
Zookeeper Smoke Test: Failed

avatar
Master Mentor

@Mike Wong

Stop the zookeeper through Ambari UI.

Then check for any zookeeper rogue process still running remember you had tried to start it manually

$ ps -ef grep | zookeeper 

Note the PID from the above output

$ kill -9 PID 

Clean the .log and .out logs in /var/log/zookeeper

# truncate --size 0 zookeeper.log
# truncate --size 0 zookeeper-zookeeper-server-FQDN.out

Restart through Ambari UI

Check for any errors in the above logs

avatar
Contributor

One thing I noticed, in the zookeeper log folder, the name of the .out file is NOT the full FQDN, just the machine name:

hdp machine-

-rw-r--r--.  1 zookeeper hadoop     3993 May 21 19:40 zookeeper.log
-rw-r--r--.  1 zookeeper hadoop 10485997 May 21 17:47 zookeeper.log.1 
-rw-r--r--.  1 zookeeper hadoop     3993 May 21 19:40 zookeeper-zookeeper-server-hdp.out

slave1

-rw-r--r--.  1 zookeeper hadoop        0 May 21 19:38 zookeeper.log
-rw-r--r--.  1 zookeeper hadoop 10486239 May 20 06:15 zookeeper.log.1
-rw-r--r--.  1 zookeeper hadoop        0 May 21 19:38 zookeeper-zookeeper-server-slave1.out

avatar
Contributor

Also, looking at the log file after restarting ZK-

2018-05-21 19:44:50,301 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /35.231.170.209:42398
2018-05-21 19:44:50,301 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-05-21 19:44:50,301 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /35.231.170.209:42398 (no session established for client)
2018-05-21 19:44:53,030 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /35.231.170.209:42408
2018-05-21 19:44:53,031 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2018-05-21 19:44:53,031 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /35.231.170.209:42408 (no session established for client)
2018-05-21 19:44:54,354 - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383] - Cannot open channel to 3 at election address slave2.c.my-project-1519895027175.internal/35.231.220.224:3888
java.net.SocketTimeoutException: connect timed out

Again, Ambari still shows ZK to be running.

avatar
Master Mentor

@Mike Wong

These 2 commands should give you the correct format of the hostnames.

$ hostname -f 

or

$ cat /etc/hosts

Can you correct the names and restart the zookeepers Have you checked in the Ambari UI --->Zookeeper--->Config parameter--> Zookeeper Server --->Zookeeper Server host(s)

Do the name correspond ?

avatar
@Mike Wong

1. Is it a new or an existing cluster ? How many total nodes you have in cluster ?

2. Plz provide us output of following command from all zookeeper server nodes

echo 'stat' | nc <ZK_HOST> 2181

Keeperexceptions could many times be due to large number of znode counts in zookeeper for various services. Also check zoo.cfg of all ZK nodes and verify if this file is identical across all nodes and hostnames for zk nodes referred are identical as well.

avatar
Contributor

@Gaurav Sharma

  • It is a new cluster. Four total nodes
  • [mike_w_wong@hdp ~]$ sudo echo 'stat'| nc hdp.c.my-project-1519895027175.internal 2181
    This ZooKeeper instance is not currently serving requests
  • [mike_w_wong@slave1 ~]$ sudo echo 'stat'| nc slave1.c.my-project-1519895027175.internal 2181
    This ZooKeeper instance is not currently serving requests
  • [mike_w_wong@slave2 ~]$ sudo echo 'stat'| nc slave2.c.my-project-1519895027175.internal 2181
    This ZooKeeper instance is not currently serving requests

Checking the zoo.cfg now