Created 09-11-2018 06:37 AM
Do Metrics Monitor and HadoopTimelineMetricsSink retain metrics to emit while MetricsCollector is dead? Or, discard them?
If retaining metrics, how long?
Created 09-11-2018 06:44 AM
Yes, The "HadoopTimelineMetricsSink" are actually the sink code running inside the components like DataNode/NameNode/NodeManager/ResourceManagers ...etc Which reads the "/etc/hadoop/conf/hadoop-metrics2.properties" and based on the INFO available in this file they know where the Metrics Collector should be running and the port (default 6188) and then they will start emitting the data to the Metrics Collector. If the Metrics Collector is down then we will see Connection Refused messages in the components logs but the Sink will keep doing it's job until the Collector Comes online & become available.
The logging for Metrics Collector will be ignored (suppressed after 20 attempts) to avoid duplicate logging on the component logs.
WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(356)) - Unable to send metrics to collector by address:http://XXX.example.com:6188/ws/v1/timeline/metrics INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
.
Created 09-11-2018 06:44 AM
Yes, The "HadoopTimelineMetricsSink" are actually the sink code running inside the components like DataNode/NameNode/NodeManager/ResourceManagers ...etc Which reads the "/etc/hadoop/conf/hadoop-metrics2.properties" and based on the INFO available in this file they know where the Metrics Collector should be running and the port (default 6188) and then they will start emitting the data to the Metrics Collector. If the Metrics Collector is down then we will see Connection Refused messages in the components logs but the Sink will keep doing it's job until the Collector Comes online & become available.
The logging for Metrics Collector will be ignored (suppressed after 20 attempts) to avoid duplicate logging on the component logs.
WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(356)) - Unable to send metrics to collector by address:http://XXX.example.com:6188/ws/v1/timeline/metrics INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
.
Created 09-11-2018 06:57 AM
Thanks, @Jay Kumar SenSharma
You mean, for example, even if Metrics Collector is dead from 9 am to 11 am, after Metrics Collector recovers, Metrics Collector will receive all metrics from 9 am to 11 am, right?
Created 09-11-2018 07:09 AM
Sink uses small caches and there are some settings like "maxRowCacheSize" and "sendInterval" which you can find inside the "Advanced hadoop-metrics2.properties" in ambari UI or in the relevant sink properties file.
Reference Links: https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-common/src/main/ja...
Before directly posting the data based on "maxRowCacheSize" to AMS collector the data is cached for a small time until cache is full the "sendInterval" can also be found as shows in the above code:
https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-hadoop-sink/src/ma...
Created 09-11-2018 07:17 AM
@Takefumi Oide Also please have a look at the default value for "MAX_METRIC_ROW_CACHE_SIZE" (maxRowCacheSize, default value 10000) , "TimelineMetricsCache.MAX_RECS_PER_NAME_DEFAULT" and "METRICS_SEND_INTERVAL (default value 59000 millisecond means ~ 1 minute)"
https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-common/src/main/ja...
https://github.com/apache/ambari/blob/release-2.7.0/ambari-metrics/ambari-metrics-hadoop-sink/src/ma...
Created 09-11-2018 10:04 AM
Thank you for your detailed comment!