Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

metrics collector + ams-hbase-unsecure Received unexpected KeeperException,

avatar

hi all

we have ambari cluster with HDP version 2.6.0

when we try to start the metrics collector from ambari , we get the following logs:

2019-01-02 06:35:16,406 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection-0x4bdeaabb-0x1680d44c1810001, quorum=master02.sys32.com:61181, baseZNode=/ams-hbase-unsecure Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/table/SYSTEM.CATALOG
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)




2019-01-02 06:35:18,256 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys32.com/101.23.98.60:61181. Will not attempt to authenticate using SASL (unknown error)
2019-01-02 06:35:18,257 WARN org.apache.zookeeper.ClientCnxn: Session 0x1680d44c1810001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2019-01-02 06:35:19,656 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys32.com/101.23.98.60:61181. Will not attempt to authenticate using SASL (unknown error)

we also

clear the contents of /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/* and restart Ambari metrics collector , but this not help

what else we can do in order to solve this isshue ?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Can you please share the complete AMS collector logs .. along with the HMaster log which you can find in the same directory where the AMS collector log is present.

Do you have enough free memory available on your system (where AMS is running)

Also can you please share the cluster node numbers present in your cluster? based on that we can try to find if the Heap settings for AMS collector and HMaster are OK or not?

Also let us know if your AMS collector is running in Embedded Mode or Distributed Mode?

Looking at collector-gc.log and gc.log (fopr hmaster process can be also good).

What is the Ambari Version and the output of the following command (Any recent ambari upgrade performed recently?) When was the AMS running fine last time? Any recent changes made to ams configs?

# rpm -qa | grep ambari

.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Michael Bronson

Can you please share the complete AMS collector logs .. along with the HMaster log which you can find in the same directory where the AMS collector log is present.

Do you have enough free memory available on your system (where AMS is running)

Also can you please share the cluster node numbers present in your cluster? based on that we can try to find if the Heap settings for AMS collector and HMaster are OK or not?

Also let us know if your AMS collector is running in Embedded Mode or Distributed Mode?

Looking at collector-gc.log and gc.log (fopr hmaster process can be also good).

What is the Ambari Version and the output of the following command (Any recent ambari upgrade performed recently?) When was the AMS running fine last time? Any recent changes made to ams configs?

# rpm -qa | grep ambari

.

avatar
Master Mentor

@Michael Bronson

Your current Thread query looks very similar to the other HCC thread opened by you: https://community.hortonworks.com/questions/231177/metrics-failed-on-orgapachehadoophbasezookeeperzo...

.

Can you please mark one of the HCC thread as Closed so that all the hcc users can respond to the same single thread.

avatar

here are the versions

 rpm -qa | grep ambari
ambari-agent-2.5.0.3-7.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-monitor-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.5.0.3-7.x86_64
ambari-server-2.5.0.3-7.x86_64
Michael-Bronson

avatar

we have enoufgh memeory

free -g
              total        used        free      shared  buff/cache   available
Mem:             31          14           5           1          11          14
Swap:             7           0           7
Michael-Bronson

avatar

how to check if we have - Embedded Mode or Distributed Mode?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

We can find the AMS collector Mode by looking at the "ams-site.xml" file.

# grep -A1 'mode' /etc/ambari-metrics-collector/conf/ams-site.xml 
     <name>timeline.metrics.service.operation.mode</name>
     <value>embedded</value><br>



If AMS is running in Embedded mode and you are keep getting the same Zookeeper error then better to try increasing the AMS collector Heap size to 2GB or more (like 4GB) and then try starting it again. Please try increasing the heap setting for HBase Master Heapsize as well.


For AMS tuning it is best to refer to the following article.
https://community.hortonworks.com/content/supportkb/208353/troubleshooting-ambari-metrics-ams.html

avatar

Dear Jay , metrics now is up aafter we set new value for hbase_master_heapsize = 1400

Michael-Bronson