Created 02-10-2016 12:46 AM
When I start ambari-metrics collector, there is no error in starting but it never starts. When I checked the log file, below is what I see:
Value of zookeper.znode.parent is: /hbase-unsecure
retries=35, started=229269 ms ago, cancelled=false, msg= 2016-02-09 19:39:15,043 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:39:15,043 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=20, retries=35, started=249286 ms ago, cancelled=false, msg= 2016-02-09 19:39:35,080 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:39:35,081 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=21, retries=35, started=269324 ms ago, cancelled=false, msg= 2016-02-09 19:39:55,151 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:39:55,152 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=22, retries=35, started=289395 ms ago, cancelled=false, msg= 2016-02-09 19:40:15,192 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:40:15,193 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=23, retries=35, started=309436 ms ago, cancelled=false, msg= 2016-02-09 19:40:35,275 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:40:35,276 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=24, retries=35, started=329519 ms ago, cancelled=false, msg= 2016-02-09 19:40:55,283 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:40:55,283 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=25, retries=35, started=349526 ms ago, cancelled=false, msg= 2016-02-09 19:41:15,437 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:41:15,437 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=26, retries=35, started=369680 ms ago, cancelled=false, msg= 2016-02-09 19:41:35,604 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:41:35,604 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=27, retries=35, started=389847 ms ago, cancelled=false, msg= 2016-02-09 19:41:55,633 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:41:55,635 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=28, retries=35, started=409878 ms ago, cancelled=false, msg= 2016-02-09 19:42:15,646 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:42:15,646 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=29, retries=35, started=429889 ms ago, cancelled=false, msg= 2016-02-09 19:42:35,841 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:42:35,841 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=30, retries=35, started=450084 ms ago, cancelled=false, msg= 2016-02-09 19:42:56,032 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:42:56,032 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=31, retries=35, started=470275 ms ago, cancelled=false, msg= 2016-02-09 19:43:16,088 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:43:16,088 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=32, retries=35, started=490331 ms ago, cancelled=false, msg= 2016-02-09 19:43:36,265 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-09 19:43:36,265 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=33, retries=35, started=510508 ms ago, cancelled=false, msg=
Created 02-29-2016 03:42 AM
I could finally solve it by combining some of the steps mentioned above.
I first checked what is the value of `zookeeper.znode.parent` in HBase. I tried setting that same value in Ambari, but that did not work because some of the metrics processes were already running on that machine. So, i had to `ps -ef | grep metrics` and kill all of them as they were caching the `/hbase` value.
Watch the ambari metrics collector logs ( /var/log/ambari-metrics-collector/ambari-metrics-collector.log) while you do the below steps
Steps:
0. tail -f /var/log/ambari-metrics-collector/ambari-metrics-collector.log
1. Stop Ambari
2. Kill all the metrics processes
3. curl --user admin:admin -i -H "X-Requested-By: ambari" -X DELETE http://`hostname -f`:8080/api/v1/clusters/CLUSTERNAME/services/AMBARI_METRICS
=> Make sure you replace CLUSTERNAME with your cluster name
4. Refresh Ambari UI
5. Add Service
6. Select Ambari Metrics
7. In the configuration screen, make sure to set the value of `zookeeper.znode.parent` to what is configured in the HBase service. By default in Ambari Metrics it is set to empty value.
8. Deploy
Created 02-11-2016 12:52 AM
@Prakash Punj Whats in the log files?
cd /var/log/ambari-metrics-collector/
and please check logs and look for errors
Created 02-11-2016 01:46 AM
Below is what's in the log (ambari-metrics-collector.log)
2016-02-10 20:38:55,744 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-10 20:38:55,745 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=18 , retries=35, started=209288 ms ago, cancelled=false, msg= 2016-02-10 20:39:15,811 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementatio n: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value conf igured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-10 20:39:15,811 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19 , retries=35, started=229354 ms ago, cancelled=false, msg=
Created 02-10-2016 07:00 PM
what ambari version is this?
Created 02-10-2016 10:20 PM
Created 02-11-2016 06:37 AM
If you have a kerberized environment, make sure hbase.regionserver.kerberos.principal and hbase.master.kerberos.principal are the same, this caused some issues in the past.
Also set the zookeeper.znode.parent to /ams-hbase if you have NOT kerberized your env. otherwise set it to /ams-hbase-secure
Stop all Ambari Metrics components, log into the machine and make sure there is no running metrics process (ps aux | grep metrics)
Start Metrics again and check the Hbase Master and Metrics Collector log (both in /var/log/ambari-metrics/collector/....)
Are you using a distributed or embedded mode?
Could you please post the following configurations:
Thanks!
Created 02-12-2016 01:35 AM
I changed /hbase to /ams-hbase and restarted but no success
hbase.rootdir --hdfs://hdp-m.samitsolutions.com:8020/apps/hbase/data hbase.cluster.distributed - TRUE Metrics service operation mode - embedded hbase.zookeeper.property.clientPort -- 2181 hbase.zookeeper.quorum --- hdp-m.samitsolutions.com out of metrics-collector log: 2016-02-11 20:34:32,065 ERROR org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. 2016-02-11 20:34:32,065 INFO org.apache.hadoop.hbase.client.RpcRetryingCaller: Call exception, tries=19, retries=35, started=229019 ms ago, cancelled=false, msg= output of ps aux | grep metrics kafka 6776 1.4 5.0 4666216 404744 ? Sl Feb09 41:25 /usr/jdk64/jdk1.8.0_60/bin/java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/var/log/kafka/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/var/log/kafka -Dlog4j.configuration=file:/usr/hdp/2.3.4.0-3485/kafka/bin/../config/log4j.properties -cp :/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/lib/ambari-metrics-kafka-sink/ambari-metrics-kafka-sink.jar:/usr/lib/ambari-metrics-kafka-sink/lib/*:/usr/hdp/2.3.4.0-3485/kafka/bin/../libs/* kafka.Kafka /usr/hdp/2.3.4.0-3485/kafka/config/server.properties root 18818 3.0 0.1 352240 15196 ? S 20:13 0:01 /usr/bin/python2 /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py START /var/lib/ambari-agent/data/auto_command-1454973527.json /var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package /var/lib/ambari-agent/data/structured-out-1454973527.json INFO /var/lib/ambari-agent/tmp ams 19096 17.3 1.1 3779588 90244 ? Sl 20:14 0:04 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_zookeeper -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201602112014 -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-zookeeper-hdp-s2.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.zookeeper.HQuorumPeer start root 19399 0.0 0.0 11300 1360 ? S 20:14 0:00 /bin/bash /var/lib/ambari-agent/ambari-sudo.sh su ams -l -s /bin/bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start root 19400 0.0 0.0 48136 1488 ? S 20:14 0:00 su ams -l -s /bin/bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start ams 19406 0.1 0.0 108164 1584 ? Ss 20:14 0:00 -bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent' ; /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start ams 19430 0.2 0.0 106196 1564 ? S 20:14 0:00 bash /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf --distributed start ams 19465 59.0 3.7 3905584 306212 ? Sl 20:14 0:14 /usr/jdk64/jdk1.8.0_60/bin/java -Xms1024m -Xmx1024m -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-201602112014 -cp /usr/lib/ambari-metrics-collector/*:/etc/ambari-metrics-collector/conf -Djava.net.preferIPv4Stack=true -Dams.log.dir=/var/log/ambari-metrics-collector -Dproc_timelineserver org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer ams 19499 185 2.7 2032776 218940 ? Sl 20:14 0:40 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_shell -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -Dhbase.ruby.sources=/usr/lib/ams-hbase/bin/../lib/ruby -Xmx256m -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase.log -Dhbase.home.dir=/usr/lib/ams-hbase/bin/.. -Dhbase.id.str= -Dhbase.root.logger=INFO,console -Dhbase.security.logger=INFO,NullAppender org.jruby.Main -X+O /usr/lib/ams-hbase/bin/../bin/hirb.rb root 19618 0.0 0.0 103308 904 pts/1 S+ 20:14 0:00 grep metrics
Created 02-12-2016 07:25 AM
Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"
Created 02-12-2016 06:07 AM
Edit your /conf/hbase-site.xml in the Hbase folder config directory or add:
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>YOUR ZOOKEEPER FOLDER</value>
</property>
For example:
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/myname/zookeeper-3.4.6</value>
</property>
Created 02-12-2016 08:09 PM
Still the same result. Wondering if its a good idea to wipe-out all the ambari-metrics component and re-install the service. What's the clean process of doing it..
Thanks
Prakash
Created 02-13-2016 08:02 AM
You dont have to remove and reinstall the ambari metrics service from Ambari, I am pretty sure this will not solve the problem!
Please see my comment above => Since hbase.cluster.distributed is true, could you please change "Metrics service operation mode" to "distributed"
If this is a new installation, you can try to remove all Metrics data:
Is this a secured (kerberized) or unsecured (no kerberos) cluster?
There are other steps we can try, but lets try the above first.
Thanks