Support Questions

Find answers, ask questions, and share your expertise

Ambari metrics not starting on sandbox

avatar
Expert Contributor

Hi,

I'm trying to proceed with tutorial about ambari metrics (http://bryanbende.com/development/2015/07/31/ambari-metrics-part1-metrics-collector), using HDP sandbox.

But I got stuck with the very first steps 😞

I did everything about port forwarding successfully, but when I tried starting ambari metrics, it didn't complain about anything, displayed a success status in the progress bar, but stayed in "stopped" status afterward.

It took me a while to figure out that this was (maybe) because HBase is not running out of the box in HDP sandbox. After I started HBase, ambari metrics successfully started, but automatically stopped after a few minutes...

My questions are quite simple :

  • Is it possible to run ambari metrics without HBase (In the context of HDP sandbox)?
  • How hbase and ambari metrics "cooperate", because HBase seems to send data to ambari metrics and ambari metrics relies on HBase (or I missed something ?)

Thanks for your help

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Sebastien Chausson Please change the "timeline.metrics.service.webapp.address" value to "0.0.0.0:6188" looks like you might be incorrectly having it as "0.0.0.0::host_group_1%:6188" somewhere. As i see the following error:

Caused by: java.lang.IllegalArgumentException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0
   at java.net.URI.create(URI.java:852)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:297)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:395)

   at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:180)
   ... 4 more
Caused by: java.net.URISyntaxException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0
   at java.net.URI$Parser.fail(URI.java:2848)
   at java.net.URI$Parser.scanEscape(URI.java:2978)
   at java.net.URI$Parser.scan(URI.java:3001)
   at java.net.URI$Parser.parseAuthority(URI.java:3142)
   at java.net.URI$Parser.parseHierarchical(URI.java:3097)
   at java.net.URI$Parser.parse(URI.java:3053)
   at java.net.URI.<init>(URI.java:588)
   at java.net.URI.create(URI.java:850)

.

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Sebastien Chausson

By default AMS will use Embedded HBase so it is not dependent on the external HBase. You can double check the same by checking the "ams-site" following setting. Please see: https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode

timeline.metrics.service.operation.mode =  embedded

.

If the AMS is going down again and again then one reason could be limited availability of resources (RAM). So can you please check if you have enough free memory available on your sandbox?

.

For testing you can stop the unwanted services for testing like Oozie/Hive and then see if the AMS continues to run.

avatar
Expert Contributor

@Jay Thanks a lot for your quick answer !

I checked the property you mentionned and it actually has the "embedded" value => I stopped HBase service on the sandbox. I also stopped Flume, Hive, Spark2, Knox, but still the same behavior : metrics collector starts, and does down after a while...Is there a particular place (logs...) to check if my VM runs out of memory ? (I dedicated 4 cores and 16GB for the sandbox, but dont know how much ambari metrics expects to work...)

Your comments also raise an additional question : in distributed mode, AMS "embedded hbase" writes metrics to HDFS - instead of local disk. Does it means that AMS never uses the standard HBase service of the cluster, but only an embedded hbase, or is there a mean to change this somehow ?

Thanks again

avatar
Master Mentor

@Sebastien Chausson

If the AMS collector is going down continuously then please check the following logs for any error or warnings:

# less /var/log/ambari-metrics-collector/ambari-metrics-collector.log 
# less /var/log/ambari-metrics-collector/ambari-metrics-collector.out

.

In case of default Embedded Mode of AMS you do not need to start the HBase separately , it will start HBase instance on its own.

Example: Notice "HMaster"

# ps -ef | grep ^ams

ams      29300 29286  7 Jun14 ?        00:50:46 /usr/jdk64/jdk1.8.0_112/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201706142141 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-kamb25103.example.com.log -Dhbase.home.dir=/usr/lib/ams-hbase/ -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start



.

In case of AMS you can check the following properties to know the AMS is writing the data. By default in the Embedded mode you will find the following values for the mentioned properties:

hbase.rootdir  = file:///var/lib/ambari-metrics-collector/hbase

.

NOTE:

 Ambari Metrics service uses HBase as default storage backend. Set the hbase.rootdir for HBase to either local filesystem path if using Ambari Metrics in embedded mode or to a HDFS dir, example: hdfs://namenode.example.org:9000/hbase.  By default HBase writes into /tmp. Change this configuration else all data will be lost on machine restart.

.

.

As it is sandbox instance you as a quick attempt you can try cleaning up the AMS data as described in the following doc and then try to restart AMS:

https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data

avatar
Expert Contributor

Well, I did clean everything as suggested, double checked configuration you mentionned, but still the same problem occurs.

The result of the ps command returns this :

[root@sandbox ~]# ps -ef | grep ^ams
ams      14734     1  0 19:38 ?        00:00:00 bash /usr/lib/ams-hbase/bin/hbase-daemon.sh --config /etc/ams-hbase/conf foreground_start master
ams      14748 14734 10 19:38 ?        00:00:26 /usr/lib/jvm/java/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/ambari-metrics-collector/hs_err_pid%p.log -Djava.io.tmpdir=/var/lib/ambari-metrics-collector/hbase-tmp -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native/ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/ambari-metrics-collector/gc.log-201706141938 -Xms1536m -Xmx1536m -Xmn256m -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Dhbase.log.dir=/var/log/ambari-metrics-collector -Dhbase.log.file=hbase-ams-master-sandbox.hortonworks.com.log -Dhbase.home.dir=/usr/lib/ams-hbase/ -Dhbase.id.str=ams -Dhbase.root.logger=INFO,RFA -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start

And ambari-metrics-collector.log (cf attachment ambari-metrics-collector.zip) file contains several errors. The latest ones are :

2017-06-14 19:39:29,423 ERROR org.apache.helix.controller.GenericHelixController: ClusterEventProcessor failed while running the controller pipeline
java.lang.NullPointerException
        at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:276)
        at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)


2017-06-14 19:39:29,430 ERROR org.apache.helix.controller.GenericHelixController: Cluster manager: sandbox.hortonworks.com is not leader. Pipeline will not be invoked

Any clue about this ? Could it be the reason why AMS automatically stops ?

Thanks

avatar
Master Mentor

@Sebastien Chausson Please change the "timeline.metrics.service.webapp.address" value to "0.0.0.0:6188" looks like you might be incorrectly having it as "0.0.0.0::host_group_1%:6188" somewhere. As i see the following error:

Caused by: java.lang.IllegalArgumentException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0
   at java.net.URI.create(URI.java:852)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:297)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:395)

   at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:180)
   ... 4 more
Caused by: java.net.URISyntaxException: Malformed escape pair at index 28: http://0.0.0.0::host_group_1%:6188:0
   at java.net.URI$Parser.fail(URI.java:2848)
   at java.net.URI$Parser.scanEscape(URI.java:2978)
   at java.net.URI$Parser.scan(URI.java:3001)
   at java.net.URI$Parser.parseAuthority(URI.java:3142)
   at java.net.URI$Parser.parseHierarchical(URI.java:3097)
   at java.net.URI$Parser.parse(URI.java:3053)
   at java.net.URI.<init>(URI.java:588)
   at java.net.URI.create(URI.java:850)

.

avatar
Expert Contributor

Thanks a lot, it fixed the problem. Dont know why this property was set like this "out of the box" in the sandbox...