Created 02-17-2016 01:17 AM
I have tried following the instructions in the link below which saves just fine. But when I go to start the Metrics Collector, it will look like it is started but then will show as being in a stopped state.
https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode
I changed the hbase.zookeeper.property.clientPort property to 2181 as in the doc as well but I noticed in the log it is showing the old port in the socket connection with the following line saying "session... for server null"
In the web interface however, I get the following errors:
Metrics Collector - ZooKeeper Server ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:2181Metrics Collector - HBase Master ProcessConnection failed: [Errno 111] Connection refused to r9-01.maas:61310
Here is a little snippet from the /var/log/ambari-metrics-collector.log file. The rest of the log seemed to repeat the same messages.
2016-02-16 16:04:31,425 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) 2016-02-16 16:04:31,426 WARN org.apache.zookeeper.ClientCnxn: Session 0x152e783fe0d0001 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2016-02-16 16:04:31,526 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server
Created 02-17-2016 05:15 PM
The RS heap size in the docs for 400 node cluster is quite low, if you have memory available I would recommend at least 8192.
The only bottleneck with embedded is write to single disk, if the disk is not oversubscribed, embedded mode works well up to 400 node cluster.
Created 06-05-2017 06:51 PM
Awesome. The referred Articles helped me solve my ambari-metrics restart issue.
Here is the short form of the steps
1) Stop Ambari Metrics Collector and Grafana
2) Backup the embedded hbase and hbase-temp folders and deleted them. (The paths could differ from default. So I looked up the hbase.rootdir and hbase.tmp.dir properties in Ambari > Ambari Metrics > Config for the appropriate path)
3) Restart Ambari Metrics Collector and Grafana
The Ambari Metrics Collector Started correctly without any errors.
Note: As you may have suspected, by deleting the existing hbase folder, you will loose the metrics history.