I moved our metrics-collector from one node to another.
I used the move function in Ambari UI. It shutdown the entire cluster and moved the metrics collector(which is very bad that it doesn't do a rolling restart or just allows me to do it manually).
However the move worked, but the setup is broken cause some metrics are missing.
After the move I updated the "timeline.metrics.service.webapp.address" setting and set it to the new collector node and restarted the required services.
There are some metrics that have "No data available". Hbase and YARN metrics are not reporting.
But for instance HDFS and System metrics are working.
And I'm running the collector in "distributed" mode.
I haven't found and related errors in the logs.
What can I do now to fix this, where should I look for errors?
Ambari v126.96.36.199 Not kerberized.
@Elias Abacioglu Start by looking at say the ResourceManager log for "HadooTimelineMetricsSink" messages.
If you have the system and HDFS metrics means that the move should be ok since that covers both types of Sinks.
- What version of Ambari are you on?
- Is the cluster kerberized?
Updated the question with more info about version and kerberos.
returned nothing on the Active ResourceManager and the Standby.
There are errors like this
2016-08-12 02:28:25,600 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:emitMetrics(127)) - Unable to connect to collector, http://hadoop-master05:6188/ws/v1/timeline/metrics 2016-08-12 02:28:25,600 WARN timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:putMetrics(262)) - Unable to send metrics to collector by address:http://hadoop-master05:6188/ws/v1/timeline/metrics
Based on the screenshots it seems like you should be getting metrics from RM and NMs.
Can you share your AMS memory settings? (ams-env :: collector_heapsize and ams-hbase-env :: master heap and region server heap sizes)
And how many node cluster do you have?
You can actually look at Grafana for which host / component is not sending metrics.
It's a 30-40 node cluster.
ams-env :: collector_heapsize = 512 MB ams-hbase-env :: hbase_master_heapsize = 512 MB ams-hbase-env :: region heap = hbase_regionserver_heapsize = 1280 MB
Forgot to mention, hadoop-master05:6188 is running a Yarn ApplicationHistoryServer.
ATS would be running on 8188 and not 6188, AMS should be listening on 6188. AMS collector is embedded into ApplicationHistoryServer code and the process output will idicate as such but the daemon is the AMS collector and not YARN's AHS. So what I am trying to convey is that this is not an alarm.
Is hadoop-master05 the correct host that you moved AMS Collector to?
/usr/jdk64/jdk1.8.0_60/bin/java -Xms512m -Xmx512m -Djava.library.path=/usr/lib/ams-hbase/lib/hadoop-native -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:GCLogFileSize=10M -Xloggc:/var/log/ambari-metrics-collector/collector-gc.log-201608111533 -cp /usr/lib/ambari-metrics-collector/*:/etc/ambari-metrics-collector/conf -Djava.net.preferIPv4Stack=true -Dams.log.dir=/var/log/ambari-metrics-collector -Dproc_timelineserver org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
is the one listening to 6188. And yes hadoop-master05 is the host I moved AMS Collector to.