Support Questions

Find answers, ask questions, and share your expertise

Zookeeper not running

avatar
Contributor

Apologies if this is basic question. But I seem to have a problem getting my Zookeeper to run.

I discovered the problem when I couldn't get a bunch of my services to run. Looking at the log files, there seems to be a recurring theme.

HiveServer2-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

2018-05-18 08:33:38,723 FATAL [main]: server.HiveServer2 (HiveServer2.java:addServerInstanceToZooKeeper(217)) - Unable to create HiveServer2 namespace: hiveserver2 on ZooKeeper

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

Yarn-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

2018-05-21 14:41:55,687 WARNavailability.MetricCollectorHAHelper (MetricCollectorHAHelper.java:findLiveCollectorHostsFromZNode(90)) - Unable to connect to zookeeper.

org.apache.hadoop.metrics2.sink.relocated.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ambari-metrics-cluster

Kafka-

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #454545}

[2018-05-18 08:25:36,626] INFO shutting down (kafka.server.KafkaServer)

[2018-05-18 08:25:36,630] INFO shut down completed (kafka.server.KafkaServer)

[2018-05-18 08:25:36,630] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)

org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 25000

So, I went back to check Zookeeper on each on my machines and discovered this:

[mike_w_wong@slave1 bin]$ ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/hdp/current/zookeeper-server/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

From the docs, I tried to get ZK running:

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/ref-94...

But I'm not having any luck.

Can anyone help??

Thanks!

1 ACCEPTED SOLUTION

avatar
@Mike Wong

1. Is it a new or an existing cluster ? How many total nodes you have in cluster ?

2. Plz provide us output of following command from all zookeeper server nodes

echo 'stat' | nc <ZK_HOST> 2181

Keeperexceptions could many times be due to large number of znode counts in zookeeper for various services. Also check zoo.cfg of all ZK nodes and verify if this file is identical across all nodes and hostnames for zk nodes referred are identical as well.

View solution in original post

11 REPLIES 11

avatar
Contributor

@Gaurav Sharma

I think I figured it out. When you asked me to check zoo.cfg, I noticed there were a block of ports ZK communicates over

server.1=hdp.c.my-project-1519895027175.internal:2888:3888
server.2=slave1.c.my-project-1519895027175.internal:2888:3888
server.3=slave2.c.my-project-1519895027175.internal:2888:3888

In GCP, you have to configure the firewall manually (or that's what I'm doing at least). Once I added that range of ports to the firewall, I restarted the ZK servers and most of my services work now!

I say most, because Spark2 History Server and Zeppelin still aren't working. But I'll open another thread for those.

avatar
Contributor

@Geoffrey Shelton Okot

They do match

From Ambari-

74508-screen-shot-2018-05-21-at-75305-pm.png

From the VMs-

[mike_w_wong@hdp ~]$ hostname -f
hdp.c.my-project-1519895027175.internal
[mike_w_wong@slave1 ~]$ hostname -f
slave1.c.my-project-1519895027175.internal
[mike_w_wong@slave2 ~]$ hostname -f
slave2.c.my-project-1519895027175.internal