Created 06-15-2017 08:17 AM
Hi,
I'm trying to proceed with tutorial about ambari metrics (http://bryanbende.com/development/2015/07/31/ambari-metrics-part1-metrics-collector), using HDP sandbox.
But I got stuck with the very first steps 😞
I did everything about port forwarding successfully, but when I tried starting ambari metrics, it didn't complain about anything, displayed a success status in the progress bar, but stayed in "stopped" status afterward.
It took me a while to figure out that this was (maybe) because HBase is not running out of the box in HDP sandbox. After I started HBase, ambari metrics successfully started, but automatically stopped after a few minutes...
My questions are quite simple :
Thanks for your help
Created 06-15-2017 01:23 PM
@Sebastien Chausson Please change the "timeline.metrics.service.webapp.address" value to "0.0.0.0:6188" looks like you might be incorrectly having it as "0.0.0.0::host_group_1%:6188" somewhere. As i see the following error:
Caused by: java.lang.IllegalArgumentException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0 at java.net.URI.create(URI.java:852) at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:297) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:395) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:180) ... 4 more Caused by: java.net.URISyntaxException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0 at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.scanEscape(URI.java:2978) at java.net.URI$Parser.scan(URI.java:3001) at java.net.URI$Parser.parseAuthority(URI.java:3142) at java.net.URI$Parser.parseHierarchical(URI.java:3097) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at java.net.URI.create(URI.java:850)
.
Created 06-15-2017 08:42 AM
By default AMS will use Embedded HBase so it is not dependent on the external HBase. You can double check the same by checking the "ams-site" following setting. Please see: https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode
timeline.metrics.service.operation.mode = embedded
.
If the AMS is going down again and again then one reason could be limited availability of resources (RAM). So can you please check if you have enough free memory available on your sandbox?
.
For testing you can stop the unwanted services for testing like Oozie/Hive and then see if the AMS continues to run.
Created 06-15-2017 09:11 AM
@Jay Thanks a lot for your quick answer !
I checked the property you mentionned and it actually has the "embedded" value => I stopped HBase service on the sandbox. I also stopped Flume, Hive, Spark2, Knox, but still the same behavior : metrics collector starts, and does down after a while...Is there a particular place (logs...) to check if my VM runs out of memory ? (I dedicated 4 cores and 16GB for the sandbox, but dont know how much ambari metrics expects to work...)
Your comments also raise an additional question : in distributed mode, AMS "embedded hbase" writes metrics to HDFS - instead of local disk. Does it means that AMS never uses the standard HBase service of the cluster, but only an embedded hbase, or is there a mean to change this somehow ?
Thanks again
Created 06-15-2017 09:26 AM
If the AMS collector is going down continuously then please check the following logs for any error or warnings:
# less /var/log/ambari-metrics-collector/ambari-metrics-collector.log # less /var/log/ambari-metrics-collector/ambari-metrics-collector.out
.
In case of default Embedded Mode of AMS you do not need to start the HBase separately , it will start HBase instance on its own.
Example: Notice "HMaster"
# ps -ef | grep ^ams ams 29300 29286 7 Jun14 ? 00:50:46 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201706142141 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-kamb25103.example.com.log -Dhbase.home.dir=/usr/lib/ams-hbase/ -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start
.
In case of AMS you can check the following properties to know the AMS is writing the data. By default in the Embedded mode you will find the following values for the mentioned properties:
hbase.rootdir = file:///var/lib/ambari-metrics-collector/hbase
.
NOTE:
Ambari Metrics service uses HBase as default storage backend. Set the hbase.rootdir for HBase to either local filesystem path if using Ambari Metrics in embedded mode or to a HDFS dir, example: hdfs://namenode.example.org:9000/hbase. By default HBase writes into /tmp. Change this configuration else all data will be lost on machine restart.
.
.
As it is sandbox instance you as a quick attempt you can try cleaning up the AMS data as described in the following doc and then try to restart AMS:
https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data
Created 06-15-2017 01:08 PM
Well, I did clean everything as suggested, double checked configuration you mentionned, but still the same problem occurs.
The result of the ps command returns this :
[root@sandbox ~]# ps -ef | grep ^ams ams 14734 1 0 19:38 ? 00:00:00 bash /usr/lib/ams-hbase/bin/hbase-daemon.sh --config /etc/ams-hbase/conf foreground_start master ams 14748 14734 10 19:38 ? 00:00:26 /usr/lib/jvm/java/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201706141938 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-sandbox.hortonworks.com.log -Dhbase.home.dir=/usr/lib/ams-hbase/ -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start
And ambari-metrics-collector.log (cf attachment ambari-metrics-collector.zip) file contains several errors. The latest ones are :
2017-06-14 19:39:29,423 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline java.lang.NullPointerException at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276) at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
2017-06-14 19:39:29,430 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked
Any clue about this ? Could it be the reason why AMS automatically stops ?
Thanks
Created 06-15-2017 01:23 PM
@Sebastien Chausson Please change the "timeline.metrics.service.webapp.address" value to "0.0.0.0:6188" looks like you might be incorrectly having it as "0.0.0.0::host_group_1%:6188" somewhere. As i see the following error:
Caused by: java.lang.IllegalArgumentException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0 at java.net.URI.create(URI.java:852) at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:297) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:395) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:180) ... 4 more Caused by: java.net.URISyntaxException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0 at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.scanEscape(URI.java:2978) at java.net.URI$Parser.scan(URI.java:3001) at java.net.URI$Parser.parseAuthority(URI.java:3142) at java.net.URI$Parser.parseHierarchical(URI.java:3097) at java.net.URI$Parser.parse(URI.java:3053) at java.net.URI.<init>(URI.java:588) at java.net.URI.create(URI.java:850)
.
Created 06-15-2017 02:34 PM
Thanks a lot, it fixed the problem. Dont know why this property was set like this "out of the box" in the sandbox...