Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do Metrics Monitor and HadoopTimelineMetricsSink retain metrics to emit while MetricsCollector is dead? Or, discard them?

avatar
Contributor

Do Metrics Monitor and HadoopTimelineMetricsSink retain metrics to emit while MetricsCollector is dead? Or, discard them?

If retaining metrics, how long?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Takefumi Oide

Yes, The "HadoopTimelineMetricsSink" are actually the sink code running inside the components like DataNode/NameNode/NodeManager/ResourceManagers ...etc Which reads the "/etc/hadoop/conf/hadoop-metrics2.properties" and based on the INFO available in this file they know where the Metrics Collector should be running and the port (default 6188) and then they will start emitting the data to the Metrics Collector. If the Metrics Collector is down then we will see Connection Refused messages in the components logs but the Sink will keep doing it's job until the Collector Comes online & become available.

The logging for Metrics Collector will be ignored (suppressed after 20 attempts) to avoid duplicate logging on the component logs.

WARN  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(356)) - Unable to send metrics to collector by address:http://XXX.example.com:6188/ws/v1/timeline/metrics
INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.

.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Takefumi Oide

Yes, The "HadoopTimelineMetricsSink" are actually the sink code running inside the components like DataNode/NameNode/NodeManager/ResourceManagers ...etc Which reads the "/etc/hadoop/conf/hadoop-metrics2.properties" and based on the INFO available in this file they know where the Metrics Collector should be running and the port (default 6188) and then they will start emitting the data to the Metrics Collector. If the Metrics Collector is down then we will see Connection Refused messages in the components logs but the Sink will keep doing it's job until the Collector Comes online & become available.

The logging for Metrics Collector will be ignored (suppressed after 20 attempts) to avoid duplicate logging on the component logs.

WARN  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(356)) - Unable to send metrics to collector by address:http://XXX.example.com:6188/ws/v1/timeline/metrics
INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.

.

avatar
Contributor

Thanks, @Jay Kumar SenSharma

You mean, for example, even if Metrics Collector is dead from 9 am to 11 am, after Metrics Collector recovers, Metrics Collector will receive all metrics from 9 am to 11 am, right?

avatar
Master Mentor

@Takefumi Oide

Sink uses small caches and there are some settings like "maxRowCacheSize" and "sendInterval" which you can find inside the "Advanced hadoop-metrics2.properties" in ambari UI or in the relevant sink properties file.

Reference Links: https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-common/src/main/ja...

Before directly posting the data based on "maxRowCacheSize" to AMS collector the data is cached for a small time until cache is full the "sendInterval" can also be found as shows in the above code:

https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-hadoop-sink/src/ma...

avatar
Master Mentor

@Takefumi Oide Also please have a look at the default value for "MAX_METRIC_ROW_CACHE_SIZE" (maxRowCacheSize, default value 10000) , "TimelineMetricsCache.MAX_RECS_PER_NAME_DEFAULT" and "METRICS_SEND_INTERVAL (default value 59000 millisecond means ~ 1 minute)"


https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-common/src/main/ja...

https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-hadoop-sink/src/ma...

avatar
Contributor

Thank you for your detailed comment!