Support Questions
Find answers, ask questions, and share your expertise

AMS not storing custom metrics

I'm facing an issue with custom Storm metrics not being stored in HBase METRIC_RECORD table. I can however see the metrics being tracked since they are being returned by the /ws/v1/timeline/metrics/metadata endpoint.

I have set: timeline.metrics.service.outofband.time.allowance.millis=600000

Just to give a background, I'm using StormTimelineMetricsSink to push custom topology metrics to the metrics collector & see the log statements to show that metrics are being emitted properly. Although the metric names show up in the dropdown of Grafana, there are no values to plot. I do see other other metric values showing up in the default graphs for AMS_HBASE, HOST, JVM etc.

I did `rm -rf hbase-tmp/` folders couple of times & started clean. I also verified there is plenty of space on the disk.

Could you please help with identifying the missing connection to ensure that the custom data gets pushed to the embedded HBASE?

7 REPLIES 7

Re: AMS not storing custom metrics

Expert Contributor
@Usha V

What is the version of Ambari that you are using? If you enable DEBUG logging for ams collector, do you see anything useful in the logs? Can you post a snippet of the metrics that you send along with the time of sending?

Re: AMS not storing custom metrics

Aravindhan,

We are using Ambari 2.6.1 but downgraded just the AMS version to 2.6.0 as per the answer in this post: https://community.hortonworks.com/questions/194448/ambari-metrics-collector-fails.html

I do not see anything that points to the issue in the metrics-collector logs.(Attached)

ambari-metrics-collector.txt

Re: AMS not storing custom metrics

@Aravindan Vijayan

Here's the snippet of metric grepped from the logs:

{
  "metrics" : [ {
    "metricname" : "__emit-count.__ack_ack",
    "appid" : "userscore",
    "instanceid" : null,
    "hostname" : "mc1.dev1",
    "timestamp" : 0,
    "starttime" : 1530563584,
    "type" : "String",
    "units" : null,
    "metrics" : {
      "1530563584" : 60.0,
      "1530563589" : 40.0,
      "1530563644" : 60.0,
      "1530563649" : 80.0,
      "1530563704" : 140.0,
      "1530563709" : 60.0,
      "1530563764" : 100.0,
      "1530563769" : 60.0,
      "1530563824" : 60.0,
      "1530563829" : 60.0,
      "1530563884" : 60.0,
      "1530563889" : 60.0,
      "1530563944" : 120.0,
      "1530563949" : 60.0,
      "1530564004" : 140.0,
      "1530564009" : 60.0,
      "1530564064" : 60.0,
      "1530564069" : 60.0,
      "1530564124" : 60.0,
      "1530564129" : 60.0,
      "1530564184" : 120.0,
      "1530564189" : 60.0,
      "1530564244" : 120.0,
      "1530564249" : 40.0,
      "1530564304" : 80.0,
      "1530564309" : 60.0,
      "1530564364" : 40.0,
      "1530564369" : 60.0,
      "1530564424" : 120.0,
      "1530564429" : 60.0,
      "1530564484" : 100.0,
      "1530564489" : 60.0,
      "1530564544" : 60.0,
      "1530564549" : 60.0,
      "1530564604" : 60.0
    },
    "metadata" : { }  
  } ]
}

Re: AMS not storing custom metrics

Expert Contributor

Can you add the following line to ams-log4j config and restart collector?

log4j.logger.org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor=DEBUG

Re: AMS not storing custom metrics

@Aravindan Vijayan

Attached are logs with Debug on. No exceptions of any kind yet if you were curious. Thanks for helping with debugging this issue.

Re: AMS not storing custom metrics

@Aravindan Vijayan

Attached is more log lines since turning ON DEBUG.

ambari-metrics-collector-new.txt

I noticed that the milliseconds time format being logged is different for "currentTime" and "startTime" from this log line which could potentially be a problem?:

2018-07-02 22:49:05,986 DEBUG org.apache.hadoop.yarn.server.applicationhistoryservice.metrics.timeline.PhoenixHBaseAccessor: Discarding out of band timeseries, currentTime = 1530571745983, startTime = 1530570729, hostname = 

Re: AMS not storing custom metrics

As noted above the difference in timestamp format seems to be the problem. Once I started multiplying taskInfo.timestamp * 1000 for startTime value in TimelineMetric, the values started showing up in the METRIC_RECORD table. So it looks like the library is expecting UNIX time in millis for startTime.