Ambari-Metrics collector not starting

When I start ambari-metrics collector, there is no error in starting but it never starts. When I checked the log file, below is what I see:

Value of zookeper.znode.parent is: /hbase-unsecure

 retries=35, started=229269 ms ago, cancelled=false, msg=
2016-02-09 19:39:15,043 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:39:15,043 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=20, retries=35, started=249286 ms ago, cancelled=false, msg=
I could finally solve it by combining some of the steps mentioned above.

I first checked what is the value of `zookeeper.znode.parent` in HBase. I tried setting that same value in Ambari, but that did not work because some of the metrics processes were already running on that machine. So, i had to `ps -ef | grep metrics` and kill all of them as they were caching the `/hbase` value.

Watch the ambari metrics collector logs ( /var/log/ambari-metrics-collector/ambari-metrics-collector.log) while you do the below steps


0. tail -f /var/log/ambari-metrics-collector/ambari-metrics-collector.log

1. Stop Ambari

2. Kill all the metrics processes

3. curl --user admin:admin -i -H "X-Requested-By: ambari" -X DELETE http://`hostname -f`:8080/api/v1/clusters/CLUSTERNAME/services/AMBARI_METRICS

=> Make sure you replace CLUSTERNAME with your cluster name

4. Refresh Ambari UI

5. Add Service

6. Select Ambari Metrics

7. In the configuration screen, make sure to set the value of `zookeeper.znode.parent` to what is configured in the HBase service. By default in Ambari Metrics it is set to empty value.

8. Deploy

View solution in original post


@Prakash Punj Whats in the log files?

cd /var/log/ambari-metrics-collector/

and please check logs and look for errors

@Neeraj Sabharwal

Below is what's in the log (ambari-metrics-collector.log)

2016-02-10 20:38:55,744 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio
n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf
igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-10 20:38:55,745 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=18
, retries=35, started=209288 ms ago, cancelled=false, msg=
2016-02-10 20:39:15,811 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio
n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf
igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-10 20:39:15,811 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19
, retries=35, started=229354 ms ago, cancelled=false, msg=


what ambari version is this?

@Jonas Straub

Ambari version is




If you have a kerberized environment, make sure hbase.regionserver.kerberos.principal and hbase.master.kerberos.principal are the same, this caused some issues in the past.

Also set the zookeeper.znode.parent to /ams-hbase if you have NOT kerberized your env. otherwise set it to /ams-hbase-secure

Stop all Ambari Metrics components, log into the machine and make sure there is no running metrics process (ps aux | grep metrics)

Start Metrics again and check the Hbase Master and Metrics Collector log (both in /var/log/ambari-metrics/collector/....)

Are you using a distributed or embedded mode?

Could you please post the following configurations:

  • hbase.rootdir
  • hbase.cluster.distributed
  • Metrics service operation mode
  • hbase.zookeeper.quorum


@Jonas Straub

I changed /hbase to /ams-hbase and restarted but no success

hbase.rootdir  --hdfs://
hbase.cluster.distributed   -  TRUE
Metrics service operation mode - embedded  -- 2181
hbase.zookeeper.quorum ---

out of metrics-collector log:

2016-02-11 20:34:32,065 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-11 20:34:32,065 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19, retries=35, started=229019 ms ago, cancelled=false, msg=

output of ps aux | grep metrics

kafka     6776  1.4  5.0 4666216 404744 ?      Sl   Feb09  41:25 /usr/jdk64/jdk1.8.0_60/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/ -cp :/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/hdp/* kafka.Kafka /usr/hdp/
root     18818  3.0  0.1 352240 15196 ?        S    20:13   0:01 /usr/bin/python2 /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ START /var/lib/ambari-agent/data/auto_command-1454973527.json /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package /var/lib/ambari-agent/data/structured-out-1454973527.json INFO /var/lib/ambari-agent/tmp
ams      19096 17.3  1.1 3779588 90244 ?       Sl   20:14   0:04 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_zookeeper -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201602112014 -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-zookeeper-hdp-s2.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.root.logger=INFO,RFA,RFAS org.apache.hadoop.hbase.zookeeper.HQuorumPeer start

root     19399  0.0  0.0  11300  1360 ?        S    20:14   0:00 /bin/bash /var/lib/ambari-agent/ su ams -l -s /bin/bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
root     19400  0.0  0.0  48136  1488 ?        S    20:14   0:00 su ams -l -s /bin/bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19406  0.1  0.0 108164  1584 ?        Ss   20:14   0:00 -bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19430  0.2  0.0 106196  1564 ?        S    20:14   0:00 bash /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19465 59.0  3.7 3905584 306212 ?      Sl   20:14   0:14 /usr/jdk64/jdk1.8.0_60/bin/java -Xms1024m -Xmx1024m -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-201602112014 -cp /usr/lib/ambari-metrics-collector/*:/etc/ambari-metrics-collector/conf -Dams.log.dir=/var/log/ambari-metrics-collector -Dproc_timelineserver org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
ams      19499  185  2.7 2032776 218940 ?      Sl   20:14   0:40 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_shell -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -Dhbase.ruby.sources=/usr/lib/ams-hbase/bin/../lib/ruby -Xmx256m -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.root.logger=INFO,console,NullAppender org.jruby.Main -X+O /usr/lib/ams-hbase/bin/../bin/hirb.rb
root     19618  0.0  0.0 103308   904 pts/1    S+   20:14   0:00 grep metrics


Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"

@Prakash Punj

Edit your /conf/hbase-site.xml in the Hbase folder config directory or add:





For example:





@Geoffrey Shelton Oko

@Neeraj Sabharwal

Still the same result. Wondering if its a good idea to wipe-out all the ambari-metrics component and re-install the service. What's the clean process of doing it..




You dont have to remove and reinstall the ambari metrics service from Ambari, I am pretty sure this will not solve the problem!

Please see my comment above => Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"

If this is a new installation, you can try to remove all Metrics data:

  1. Stop Ambari Metrics (Collector + all monitors)
  2. Make sure no Metrics process is running (you can kill all processes belonging to user "ams")
  3. Remove data from hdfs (hdfs dfs -rmr hdfs://
  4. Remove data from zookeeper (login: zookeeper-client -server; removal: rmr /<hbase znode>)
  5. Start the Ambari Metrics Collector (not the monitors!)
  6. See if the collector starts, if not please upload the hbase-master and ambari-metrics-collector log

Is this a secured (kerberized) or unsecured (no kerberos) cluster?

There are other steps we can try, but lets try the above first.
