Support Questions

Find answers, ask questions, and share your expertise

ambari metrics collector

avatar

ambari metrics collector got stopped on our machine. when we try to restart in ambari , it is failing. but when i check the processes on the machine, they are running.

Also i get ambari alerts as

Metrics Collector - Auto-Restart Status

Metrics Collector has been auto-started 2 times since 2016-07-29 00:12:30.

I do see the following error in the logs

: 6:50:24,047 ERROR [main] ZooKeeperWatcher:652 - hconnection-0x5a7005d-0x156315434410005, quorum=localhost:61181, baseZNode=/ams-hbase-unsecure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:417) Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)

even i tried reinstalling metrics collector. but it is not working. any thoughts on how to fix this.

I have seen a few posts in the forum, already but none helps

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi @ARUN,

Please clear the contents of /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/* and restart Ambari metrics collector. Let me know if that works!

Thanks

View solution in original post

15 REPLIES 15

avatar
Explorer

@Aravindan Vijayan,

Clearing the contents of /var/lib/ambari-metrics-collector/hbase-tmp/* and restarting the Ambari metrics collector totally fixed our problem on HDInsight Hadoop 3.6.1 (HDP 2.6.5).

It was a little confusing because HBase is not deployed on the Azure Hadoop 3.6.1 cluster. Evidently, Ambari Metrics is running it's own internal HBase instance?

Fyi...The error in /var/log/ambari-metrics-collector/ambari-metrics-collector.log was...

21:09:16,997  INFO [main-SendThread(xxxx.cx.internal.cloudapp.net:61181)] ClientCnxn:1032 - Opening socket connection to se
rver xxx.cx.internal.cloudapp.net/xx.xxx.xx.xx:61181. Will not attempt to authenticate using SASL (unknown error)
21:09:16,998  WARN [main-SendThread(xxx.cx.internal.cloudapp.net:61181)] ClientCnxn:1162 - Session 0x16886b291270010 for s
erver null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

Thanks

avatar
Explorer

This solution is works for me on HDP 3.1.4 Ambari 2.7

 

Thanks for sharing.

avatar
Rising Star

Thanks it solved problem, but how did you get answer for this problem? I will never guess himself.

avatar
Expert Contributor

@ARUN - did you get this fixed ? @mqureshi , @Aravindan Vijayan - looping you as well.

I'm getting the same error ->

https://community.hortonworks.com/questions/70820/kerberized-hdp-24-amabari-metric-v-2220-shutting-d...

----------------------------------

2016-12-11 17:00:38,348 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure 2016-12-11 17:00:38,348 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 2016-12-11 17:00:38,348 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x155aa3ef-0x158eed3ab0f0002, quorum=localhost:61181, baseZNode=/ams-hbase-secure Unable to set watcher on znode (/ams-hbase-secure) org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:221) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:417)

avatar
Explorer

@Aravindan Vijayan - We have a 20 node Hadoop cluster. We installed the Base service completely in our cluster. The Ambari metrics collector mode - embedded. We are noticing the Ambari metrics collector starts and stops soon after some time. Checking the logs below is the error

2017-12-12 11:03:38,443 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: hconnection-0x70cf32e3-0x1604b8318690004, quorum=hdpmprod000.corp.pgcore.com:61181, baseZNode=/ams-hbase-unsecure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server

Since we do not have HBase service in our cluster - Can I comment out all the AMS - HBase configuration to avoid the AMS-Hbase related issues or what is the best way to handle it.

Any help is much appreciated.

Thanks,

Abhishek

avatar
Contributor

@Geoffrey Shelton Okot,@ARUN,@Aravindan Vijayan,@Abhishek Reddy Chamakura,@Karan Alang Did you guys got a permanent solution for this issue?

we are getting the error "KeeperErrorCode = NodeExists for /ams-hbase-secure/namespace/hbase"

We are facing the same issue .

email address: tauqeerkhan@outlook.com