Support Questions

Find answers, ask questions, and share your expertise

Ambari-Metrics collector not starting

avatar
Expert Contributor

When I start ambari-metrics collector, there is no error in starting but it never starts. When I checked the log file, below is what I see:

Value of zookeper.znode.parent is: /hbase-unsecure

 retries=35, started=229269 ms ago, cancelled=false, msg=
2016-02-09 19:39:15,043 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:39:15,043 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=20, retries=35, started=249286 ms ago, cancelled=false, msg=
2016-02-09 19:39:35,080 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:39:35,081 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=21, retries=35, started=269324 ms ago, cancelled=false, msg=
2016-02-09 19:39:55,151 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:39:55,152 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=22, retries=35, started=289395 ms ago, cancelled=false, msg=
2016-02-09 19:40:15,192 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:40:15,193 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=23, retries=35, started=309436 ms ago, cancelled=false, msg=
2016-02-09 19:40:35,275 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:40:35,276 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=24, retries=35, started=329519 ms ago, cancelled=false, msg=
2016-02-09 19:40:55,283 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:40:55,283 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=25, retries=35, started=349526 ms ago, cancelled=false, msg=
2016-02-09 19:41:15,437 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:41:15,437 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=26, retries=35, started=369680 ms ago, cancelled=false, msg=
2016-02-09 19:41:35,604 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:41:35,604 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=27, retries=35, started=389847 ms ago, cancelled=false, msg=
2016-02-09 19:41:55,633 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:41:55,635 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=28, retries=35, started=409878 ms ago, cancelled=false, msg=
2016-02-09 19:42:15,646 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:42:15,646 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=29, retries=35, started=429889 ms ago, cancelled=false, msg=
2016-02-09 19:42:35,841 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:42:35,841 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=30, retries=35, started=450084 ms ago, cancelled=false, msg=
2016-02-09 19:42:56,032 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:42:56,032 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=31, retries=35, started=470275 ms ago, cancelled=false, msg=
2016-02-09 19:43:16,088 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:43:16,088 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=32, retries=35, started=490331 ms ago, cancelled=false, msg=
2016-02-09 19:43:36,265 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-09 19:43:36,265 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=33, retries=35, started=510508 ms ago, cancelled=false, msg=


1 ACCEPTED SOLUTION

avatar
Contributor

I could finally solve it by combining some of the steps mentioned above.

I first checked what is the value of `zookeeper.znode.parent` in HBase. I tried setting that same value in Ambari, but that did not work because some of the metrics processes were already running on that machine. So, i had to `ps -ef | grep metrics` and kill all of them as they were caching the `/hbase` value.

Watch the ambari metrics collector logs ( /var/log/ambari-metrics-collector/ambari-metrics-collector.log) while you do the below steps

Steps:

0. tail -f /var/log/ambari-metrics-collector/ambari-metrics-collector.log

1. Stop Ambari

2. Kill all the metrics processes

3. curl --user admin:admin -i -H "X-Requested-By: ambari" -X DELETE http://`hostname -f`:8080/api/v1/clusters/CLUSTERNAME/services/AMBARI_METRICS

=> Make sure you replace CLUSTERNAME with your cluster name

4. Refresh Ambari UI

5. Add Service

6. Select Ambari Metrics

7. In the configuration screen, make sure to set the value of `zookeeper.znode.parent` to what is configured in the HBase service. By default in Ambari Metrics it is set to empty value.

8. Deploy

View solution in original post

31 REPLIES 31

avatar
Master Mentor

@Prakash Punj Whats in the log files?

cd /var/log/ambari-metrics-collector/

and please check logs and look for errors

avatar
Expert Contributor
@Neeraj Sabharwal

Below is what's in the log (ambari-metrics-collector.log)

2016-02-10 20:38:55,744 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio
n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf
igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-10 20:38:55,745 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=18
, retries=35, started=209288 ms ago, cancelled=false, msg=
2016-02-10 20:39:15,811 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio
n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf
igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-10 20:39:15,811 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19
, retries=35, started=229354 ms ago, cancelled=false, msg=


avatar

what ambari version is this?

avatar
Expert Contributor

@Jonas Straub

Ambari version is 2.2.0.0

Thanks

prakash

avatar

If you have a kerberized environment, make sure hbase.regionserver.kerberos.principal and hbase.master.kerberos.principal are the same, this caused some issues in the past.

Also set the zookeeper.znode.parent to /ams-hbase if you have NOT kerberized your env. otherwise set it to /ams-hbase-secure

Stop all Ambari Metrics components, log into the machine and make sure there is no running metrics process (ps aux | grep metrics)

Start Metrics again and check the Hbase Master and Metrics Collector log (both in /var/log/ambari-metrics/collector/....)

Are you using a distributed or embedded mode?

Could you please post the following configurations:

  • hbase.rootdir
  • hbase.cluster.distributed
  • Metrics service operation mode
  • hbase.zookeeper.property.clientPort
  • hbase.zookeeper.quorum

Thanks!

avatar
Expert Contributor
@Jonas Straub

I changed /hbase to /ams-hbase and restarted but no success

hbase.rootdir  --hdfs://hdp-m.samitsolutions.com:8020/apps/hbase/data
hbase.cluster.distributed   -  TRUE
Metrics service operation mode - embedded
hbase.zookeeper.property.clientPort  -- 2181
hbase.zookeeper.quorum --- hdp-m.samitsolutions.com

out of metrics-collector log:

2016-02-11 20:34:32,065 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2016-02-11 20:34:32,065 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19, retries=35, started=229019 ms ago, cancelled=false, msg=

output of ps aux | grep metrics

kafka     6776  1.4  5.0 4666216 404744 ?      Sl   Feb09  41:25 /usr/jdk64/jdk1.8.0_60/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/2.3.4.0-3485/kafka/bin/../config/log4j.properties -cp :/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/hdp/2.3.4.0-3485/kafka/bin/../libs/* kafka.Kafka /usr/hdp/2.3.4.0-3485/kafka/config/server.properties
root     18818  3.0  0.1 352240 15196 ?        S    20:13   0:01 /usr/bin/python2 /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py START /var/lib/ambari-agent/data/auto_command-1454973527.json /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package /var/lib/ambari-agent/data/structured-out-1454973527.json INFO /var/lib/ambari-agent/tmp
ams      19096 17.3  1.1 3779588 90244 ?       Sl   20:14   0:04 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_zookeeper -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201602112014 -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-zookeeper-hdp-s2.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.zookeeper.HQuorumPeer start

root     19399  0.0  0.0  11300  1360 ?        S    20:14   0:00 /bin/bash /var/lib/ambari-agent/ambari-sudo.sh su ams -l -s /bin/bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
root     19400  0.0  0.0  48136  1488 ?        S    20:14   0:00 su ams -l -s /bin/bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19406  0.1  0.0 108164  1584 ?        Ss   20:14   0:00 -bash -c export  PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19430  0.2  0.0 106196  1564 ?        S    20:14   0:00 bash /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start
ams      19465 59.0  3.7 3905584 306212 ?      Sl   20:14   0:14 /usr/jdk64/jdk1.8.0_60/bin/java -Xms1024m -Xmx1024m -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-201602112014 -cp /usr/lib/ambari-metrics-collector/*:/etc/ambari-metrics-collector/conf -Djava.net.preferIPv4Stack=true -Dams.log.dir=/var/log/ambari-metrics-collector -Dproc_timelineserver org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
ams      19499  185  2.7 2032776 218940 ?      Sl   20:14   0:40 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_shell -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -Dhbase.ruby.sources=/usr/lib/ams-hbase/bin/../lib/ruby -Xmx256m -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.id.str= -Dhbase.root.logger=INFO,console -Dhbase.security.logger=INFO,NullAppender org.jruby.Main -X+O /usr/lib/ams-hbase/bin/../bin/hirb.rb
root     19618  0.0  0.0 103308   904 pts/1    S+   20:14   0:00 grep metrics


avatar

Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"

avatar
Master Mentor

@Prakash Punj

Edit your /conf/hbase-site.xml in the Hbase folder config directory or add:

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>YOUR ZOOKEEPER FOLDER</value>

</property>

For example:

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/myname/zookeeper-3.4.6</value>

</property>

avatar
Expert Contributor
@Geoffrey Shelton Oko

@Neeraj Sabharwal

Still the same result. Wondering if its a good idea to wipe-out all the ambari-metrics component and re-install the service. What's the clean process of doing it..

Thanks

Prakash

avatar

You dont have to remove and reinstall the ambari metrics service from Ambari, I am pretty sure this will not solve the problem!

Please see my comment above => Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"

If this is a new installation, you can try to remove all Metrics data:

  1. Stop Ambari Metrics (Collector + all monitors)
  2. Make sure no Metrics process is running (you can kill all processes belonging to user "ams")
  3. Remove data from hdfs (hdfs dfs -rmr hdfs://hdp-m.samitsolutions.com:8020/apps/hbase/data)
  4. Remove data from zookeeper (login: zookeeper-client -server hdp-m.samitsolutions.com:2181; removal: rmr /<hbase znode>)
  5. Start the Ambari Metrics Collector (not the monitors!)
  6. See if the collector starts, if not please upload the hbase-master and ambari-metrics-collector log

Is this a secured (kerberized) or unsecured (no kerberos) cluster?

There are other steps we can try, but lets try the above first.

Thanks