Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Ambari Metrics - Switching from Embedded to Distributed Mode Fails

avatar
Contributor

I have tried following the instructions in the link below which saves just fine. But when I go to start the Metrics Collector, it will look like it is started but then will show as being in a stopped state.

https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

I changed the hbase.zookeeper.property.clientPort property to 2181 as in the doc as well but I noticed in the log it is showing the old port in the socket connection with the following line saying "session... for server null"

In the web interface however, I get the following errors:

Metrics Collector - ZooKeeper Server ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:2181Metrics Collector - HBase Master ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:61310

Here is a little snippet from the /var/log/ambari-metrics-collector.log file. The rest of the log seemed to repeat the same messages.

2016-02-16 16:04:31,425 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-02-16 16:04:31,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x152e783fe0d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-02-16 16:04:31,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server 
1 ACCEPTED SOLUTION

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
10 REPLIES 10

avatar
Master Mentor

avatar
Contributor

Thank you for the link, specifically I saw the outline under the General Guidelines section which I didn't see before. Our cluster is 300-400 nodes so I will leave the collector in embedded mode and reconfigure it per the settings below and hopefully the collector will stop failing =)

Production200-400200GBembeddedn.a.metrics_collector_heap_size=2048

hbase_regionserver_heapsize=2048

hbase_master_heapsize=2048

hbase_master_xmn_size=512

avatar
Master Mentor

@Kyle Pifer yes, as a general rule of thumb, check the docs first, then HCC, then wiki :).

avatar
Master Mentor
@Kyle Pifer

I would be opening a case with support if I am in your shoes.

avatar
Super Collaborator
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

Thanks Swagle for the helpful info!

avatar

Make sure you have configured the right heap size as well as validate the following configurations:

  • hbase.rootdir = hdfs://ams......
  • hbase.cluster.distributed=True
  • Metrics service operation mode=distributed
  • hbase.zookeeper.property.clientPort=2181
  • hbase.zookeeper.quorum=<zookeeper quorum, comma separated without port>
  • zookeper.znode.parent= /ams-hbase-unsecure or /hbase-hbase-secure (depending kerberos yes/no)

Restart the metrics collector and make sure a new Znode was created in Zookeeper. Make sure Hbase and the Metrics collector have been started successfully.

avatar
Contributor

Thanks Jonas, I am in the process of trying embedded mode again per the 200-400 node guideline along with Swagle's recommendation of increasing the regionserver heap size. I will bookmark your response though if we increase the size of the cluster to require distributed mode. When I had tried distributed mode earlier I did not change hbase.zookeeper.quorum or zookeper.znode.parent so that could have been it. Thanks again though for taking the time 😃

avatar
Explorer

I came across this article while searching for a resolution to Errno 111.

In case anyone else stumbles upon this article for the same reason, I was able to get my metrics working again by resetting metrics.

https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

http://jonmorisissqlblog.blogspot.com/2016/08/ambari-metrics-collector-not-starting.html