Support Questions

v-kypife · ‎02-17-2016

I have tried following the instructions in the link below which saves just fine. But when I go to start the Metrics Collector, it will look like it is started but then will show as being in a stopped state.

https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

I changed the hbase.zookeeper.property.clientPort property to 2181 as in the doc as well but I noticed in the log it is showing the old port in the socket connection with the following line saying "session... for server null"

In the web interface however, I get the following errors:

Metrics Collector - ZooKeeper Server ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:2181Metrics Collector - HBase Master ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:61310

Here is a little snippet from the /var/log/ambari-metrics-collector.log file. The rest of the log seemed to repeat the same messages.

2016-02-16 16:04:31,425 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error)
2016-02-16 16:04:31,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x152e783fe0d0001 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2016-02-16 16:04:31,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server

sidwagle · ‎02-17-2016

The RS heap size in the docs for 400 node cluster is quite low, if you have memory available I would recommend at least 8192.

The only bottleneck with embedded is write to single disk, if the disk is not oversubscribed, embedded mode works well up to 400 node cluster.

View solution in original post

aervits · ‎02-17-2016

@Kyle Pifer please use our docs to install, wiki's tend to be a bit out of date http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/ch_amb_ref...

v-kypife · ‎02-17-2016

Thank you for the link, specifically I saw the outline under the General Guidelines section which I didn't see before. Our cluster is 300-400 nodes so I will leave the collector in embedded mode and reconfigure it per the settings below and hopefully the collector will stop failing =)

Production

200-400

200GB

embedded

n.a.

metrics_collector_heap_size=2048

hbase_regionserver_heapsize=2048

hbase_master_heapsize=2048

hbase_master_xmn_size=512

aervits · ‎02-17-2016

@Kyle Pifer yes, as a general rule of thumb, check the docs first, then HCC, then wiki :).

nsabharwal · ‎02-17-2016

@Kyle Pifer

I would be opening a case with support if I am in your shoes.

sidwagle · ‎02-17-2016

The RS heap size in the docs for 400 node cluster is quite low, if you have memory available I would recommend at least 8192.

The only bottleneck with embedded is write to single disk, if the disk is not oversubscribed, embedded mode works well up to 400 node cluster.

v-kypife · ‎02-17-2016

Thanks Swagle for the helpful info!

jstraub · ‎02-17-2016

Make sure you have configured the right heap size as well as validate the following configurations:

hbase.rootdir = hdfs://ams......
hbase.cluster.distributed=True
Metrics service operation mode=distributed
hbase.zookeeper.property.clientPort=2181
hbase.zookeeper.quorum=<zookeeper quorum, comma separated without port>
zookeper.znode.parent= /ams-hbase-unsecure or /hbase-hbase-secure (depending kerberos yes/no)

Restart the metrics collector and make sure a new Znode was created in Zookeeper. Make sure Hbase and the Metrics collector have been started successfully.

v-kypife · ‎02-17-2016

Thanks Jonas, I am in the process of trying embedded mode again per the 200-400 node guideline along with Swagle's recommendation of increasing the regionserver heap size. I will bookmark your response though if we increase the size of the cluster to require distributed mode. When I had tried distributed mode earlier I did not change hbase.zookeeper.quorum or zookeper.znode.parent so that could have been it. Thanks again though for taking the time 😃

Jon_Morisi · ‎08-26-2016

I came across this article while searching for a resolution to Errno 111.

In case anyone else stumbles upon this article for the same reason, I was able to get my metrics working again by resetting metrics.

https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

http://jonmorisissqlblog.blogspot.com/2016/08/ambari-metrics-collector-not-starting.html

Cloudera Community

Support Questions

Ambari Metrics - Switching from Embedded to Distributed Mode Fails