Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

moved metrics collector, missing metrics

moved metrics collector, missing metrics

New Contributor

Hi,

I moved our metrics-collector from one node to another.

I used the move function in Ambari UI. It shutdown the entire cluster and moved the metrics collector(which is very bad that it doesn't do a rolling restart or just allows me to do it manually).

However the move worked, but the setup is broken cause some metrics are missing.

After the move I updated the "timeline.metrics.service.webapp.address" setting and set it to the new collector node and restarted the required services.

There are some metrics that have "No data available". Hbase and YARN metrics are not reporting.

But for instance HDFS and System metrics are working.

And I'm running the collector in "distributed" mode.

I haven't found and related errors in the logs.

What can I do now to fix this, where should I look for errors?

Update:

Ambari v2.2.2.0 Not kerberized.

Visuals:

HDFS

6604-screen-shot-2016-08-12-at-110630.png

Hbase

6605-screen-shot-2016-08-12-at-110200.png

Yarn

6606-screen-shot-2016-08-12-at-110140.png

13 REPLIES 13

Re: moved metrics collector, missing metrics

Expert Contributor

@Elias Abacioglu Start by looking at say the ResourceManager log for "HadooTimelineMetricsSink" messages.

If you have the system and HDFS metrics means that the move should be ok since that covers both types of Sinks.

- What version of Ambari are you on?

- Is the cluster kerberized?

Re: moved metrics collector, missing metrics

New Contributor

Updated the question with more info about version and kerberos.

grep HadooTimelineMetricsSink/var/log/hadoop-yarn/yarn/*

returned nothing on the Active ResourceManager and the Standby.

Re: moved metrics collector, missing metrics

Expert Contributor

I have a typo in the class name: HadoopTimelineMetricsSink should be the search string.

Re: moved metrics collector, missing metrics

New Contributor

There are errors like this

2016-08-12 02:28:25,600 INFO  timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:emitMetrics(127)) - Unable to connect to collector, http://hadoop-master05:6188/ws/v1/timeline/metrics

2016-08-12 02:28:25,600 WARN  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(262)) - Unable to send metrics to collector by address:http://hadoop-master05:6188/ws/v1/timeline/metrics

Re: moved metrics collector, missing metrics

New Contributor

Forgot to mention, hadoop-master05:6188 is running a Yarn ApplicationHistoryServer.

Re: moved metrics collector, missing metrics

Expert Contributor

Based on the screenshots it seems like you should be getting metrics from RM and NMs.

Can you share your AMS memory settings? (ams-env :: collector_heapsize and ams-hbase-env :: master heap and region server heap sizes)

And how many node cluster do you have?

You can actually look at Grafana for which host / component is not sending metrics.

Re: moved metrics collector, missing metrics

New Contributor

It's a 30-40 node cluster.

ams-env :: collector_heapsize = 512 MB ams-hbase-env :: hbase_master_heapsize = 512 MB ams-hbase-env :: region heap = hbase_regionserver_heapsize = 1280 MB

Re: moved metrics collector, missing metrics

Expert Contributor
Forgot to mention, hadoop-master05:6188 is running a Yarn ApplicationHistoryServer.

ATS would be running on 8188 and not 6188, AMS should be listening on 6188. AMS collector is embedded into ApplicationHistoryServer code and the process output will idicate as such but the daemon is the AMS collector and not YARN's AHS. So what I am trying to convey is that this is not an alarm.

Is hadoop-master05 the correct host that you moved AMS Collector to?

Highlighted

Re: moved metrics collector, missing metrics

New Contributor
/usr/jdk64/jdk1.8.0_60/bin/java -Xms512m -Xmx512m -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:GCLogFileSize=10M -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-201608111533 -cp /usr/lib/ambari-metrics-collector/*:/etc/ambari-metrics-collector/conf -Djava.net.preferIPv4Stack=true -Dams.log.dir=/var/log/ambari-metrics-collector -Dproc_timelineserver org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer

is the one listening to 6188. And yes hadoop-master05 is the host I moved AMS Collector to.

Don't have an account?
Coming from Hortonworks? Activate your account here