Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

The Metrics Monitor suddenly stopped working. I tried to restart it but the restart never stops.

I checked the log files under /var/log/ambari-metrics-collector folder. I found that there are two types of .log files 1) hbase-ams-master-ITEM-70288.log 2) ambari-metrics-collector.log.

Inside the hbase-ams-master-ITEM-70288.log file, I found some errors. I have checked the dashboard, and found that the zookeeper and HBase services are up and running without any issues. The log file shows error related to zookeeper and HBase. I have spend two hours trying to understand this problem and any solution, so I am posting in this forum for any help. I ma pasting the error below. This same block of error is repeated all over the .log file.

2016-03-17 21:37:57,956 INFO  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-03-17 21:37:57,961 WARN  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-03-17 21:37:58,071 INFO  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-03-17 21:37:58,071 WARN  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-03-17 21:37:58,074 WARN  [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2016-03-17 21:37:59,172 INFO  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-03-17 21:37:59,172 WARN  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-03-17 21:37:59,273 WARN  [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2016-03-17 21:37:59,273 INFO  [main-SendThread(localhost:61181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-03-17 21:37:59,273 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts
2016-03-17 21:37:59,273 WARN  [main] zookeeper.ZKUtil: clean znode for master0x0, quorum=localhost:61181, baseZNode=/hbase Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:835)
        at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267)
        at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2304)
11 REPLIES 11
Highlighted

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

is your cluster kerberized? are you using the Zookeeper service that is provided by AMS or the one of your hadoop cluster?

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

@Jonas Straub My cluster is not kerberized. I haven't understood your second part of the question. The zookeeper service is the one that was installed when I installed HDP through Ambari. The metrics collector was running fine till two days back. I have only formatted the namenode just yesterday. Please let me if you need any other information. I have resolved all the major issues I have faced previously on Ambari, but I am badly stuck this. Thanks.

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

What are the values of the following configurations?

  • hbase.rootdir
  • hbase.cluster.distributed
  • Metrics service operation mode
  • hbase.zookeeper.property.clientPort
  • hbase.zookeeper.quorum

If you are using the Zookeeper Quorum that was installed during the HDP installation, make sure hbase.zookeeper.quorum contains the quorum of your cluster (see e.g. yarn.resourcemanager.zk-address) and hbase.zookeeper.property.clientPort is set to 2181.

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

@Jonas Straub

Here are the values of the properties you have mentioned:

hbase.rootdir = hdfs://item-70288:8020/apps/hbase/data

hbase.cluster.distributed= true

Metrics service operation mode = embedded

hbase.zookeeper.property.clientPort= 2181

hbase.zookeeper.quorum =item-70288

In the error log that I pasted, I observed this line "hbase Unable to get data of znode /hbase/master". I thought (and I still think), that it could be related to hbase data, so I followed the instructions given in the link below, but with no success.

https://community.hortonworks.com/questions/11779/hbase-master-shutting-down-with-zookeeper-delete-f...

Please note that I have created another thread highlighting the error that I have observed. Following is the link. I am not abandoning this thread.

https://community.hortonworks.com/questions/23598/keepererrorcode-connectionloss-for-hbasemaster.htm...

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Have you adjusted the value hbase.zookeeper.property.clientPort to 2181, because if I lock at your logs its showing the old/initial setting

016-03-1721:37:58,071 INFO [main-SendThread(localhost:61181)] zookeeper.ClientCnxn:Opening socket connection to server localhost/127.0.0.1:61181.Willnot attempt to authenticate using SASL (unknown error)

Please change Metrics service operation mode to distributed

Could you please shutdown all Metrics Services and components, also check the processes on the Metrics Collector node and make sure there is no AMS process running anymore. Afterwards restart metrics collector service and check the logs for any errors.

After you have restarted the Metrics Collector, also check and see if the AMS logs show the Zookeeper port 61181 or 2181

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

@Jonas Straub I have not changed any port number manually. The port number 2181 was already there in the configuration. I rechecked the port number and currently it is 2818. I checked all the properties but I could not find the port number 61181 mentioned anywhere, so I wonder from where the port number 61181 is being read. Could you please help me figure out where could be 61181 port number mentioned?

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

@Jonas Straub

I found that the port number 61181 was mentioned for the property hbase.zookeeper.property.clientPort in the Ambari-Metrics configuration file. So I changed this port number to 2181. After this change, I ensure that the AMS service is completely shut down and then restarted the Ambari metrics collector from the Dashboard. When I checked the log, I found the following error:

java.io.IOException: Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.

I checked the the port on which Zookeeper server is running and found that it is also running on 2181. Therefore I changed the property of hbase.zookeeper.property.clientPort to 2182 in the Ambari Metrics (ams-hbase-site.xml) configuration and HBase (ams-hbase-site.xml) configuration through Ambari Dashboard. After this I restarted Hbase Master and then Metrics collector service. Both the services did not start. I checked the logs and found the following errors.

hbase-hbase-master-ITEM-70288.log

---------------------------------------------------

2016-03-21 15:58:08,684 WARN [main-SendThread(Item-70288:2182)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

hbase-ams-master-ITEM-70288.log

----------------------------------------------

KeeperErrorCode = ConnectionLoss for /hbase/master

zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts

zookeeper.ZKUtil: clean znode for master0x0, quorum=item-70288:2182, baseZNode=/hbase Unable to get data of znode /hbase/master

I am attaching the log form the file hbase-ams-master-ITEM-70288.log.errorlog-2.txt

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Expert Contributor

I have fixed the issue related to Metrics Monitor not starting. Even though I have no idea why the above error came, I fixed by changing/setting the following property under Ambari Metrics Monitor configuration through Ambari.

  1. set hbase.zookeeper.property.clientPort to 2181. I had observed that this was set to 61181, which I am not sure how it happend. This is a very important property. The Zookeeper server also runs at the same port, so I guess the hbase.zookeeper.property.clientPort should also be set to the same port. If these ports are different, then there will be a number of errors thrown by Ambari-metrics collector service is started. Therefore make sure this property is set to the same port as the zookeeper server.
  2. Change the "Metrics Service operation mode" to distributed.
  3. Set hbase.cluster.distributed to true
  4. Set hbase.rootdir to a hdfs folder. I created a new folder in the hdfs root and made hdfs the owner of the folder.

After I did this, I restarted only the Ambari Metric collector and this time it was happy to start with a green flag ;)

Re: Metrics Collector is not starting. Showing error related to Zookeeper and Hbase.

Guru

@Pradeep kumar AMS uses it's own embedded HBase and ZooKeeper instances, and not those that are installed in the cluster and that's why the ports look different. Here are the official docs on how to move from embedded storage to distributed storage: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_ambari_reference_guide/content/ams_collec....

Don't have an account?
Coming from Hortonworks? Activate your account here