Support Questions

Find answers, ask questions, and share your expertise

Lost some metric data of namenode

Expert Contributor

I use ambari and hdp.

I had reinstall ambari metric and restart it.

I found after the reinstall, the metric data of

NameNode GC count

NameNode GC time

NN Connection Load

NameNode RPC

NameNode Operations

shows "No Data Available"

I had try restart ambari metrics ,but no use to reslove this.

And there no errors shows on the ambari WEB UI

3 REPLIES 3

Super Mentor

@darkz yu

By default the AMS runs in Embedded Mode, In which by default the metrics data is stored inside the location mentioned by the "hbase.rootdir" (Default value: "file:///opt/log/ambari-metrics-collector/hbase" ) inside ams-hbase-site configuration.

So if by any chance that "hbase.rootdir" path (HDFS or local) is removed / deleted then your old metrics data will be lost. Can you please check if that is the case.

Additional Checks:

1. Also can you please check if after reinstalling AMS, have you restarted the services like HDFS service so that these components can pickup the latest metrics JAR files.

$ lsof -p $NAMENODE_PID | grep ambari-metrics
java  2193 hdfs  mem  REG  253,1  3125254 25188339 /usr/lib/ambari-metrics-hadoop-sink/ambari-metrics-hadoop-sink-with-common-2.5.1.0.159.jar

2. Also please check if the "Sink timeline started" by the HDFS components or not? It can be seen in the logs of the HDFS components like DataNode/NameNode.

Example:

$ grep -i 'Sink timeline started' /var/log/hadoop/hdfs/hadoop-hdfs-namenode-kamb25101.example.com.log 
2017-06-09 12:02:23,386 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started

.

3. Check the file on the HDFS component installed hosts to see if it is pointing to the correct AMS host and if the host is reachable or not?

$ grep 'host' /etc/hadoop/conf/hadoop-metrics2.properties
*.sink.timeline.slave.host.name=kamb25101.example.com
datanode.sink.timeline.collector.hosts=kamb25103.example.com
namenode.sink.timeline.collector.hosts=kamb25103.example.com
resourcemanager.sink.timeline.collector.hosts=kamb25103.example.com
nodemanager.sink.timeline.collector.hosts=kamb25103.example.com
jobhistoryserver.sink.timeline.collector.hosts=kamb25103.example.com
journalnode.sink.timeline.collector.hosts=kamb25103.example.com
maptask.sink.timeline.collector.hosts=kamb25103.example.com
reducetask.sink.timeline.collector.hosts=kamb25103.example.com
applicationhistoryserver.sink.timeline.collector.hosts=kamb25103.example.com

.

Port is correct or not?

$ grep '*.sink.timeline.port=6188' /etc/hadoop/conf/hadoop-metrics2.properties
*.sink.timeline.port=6188

.

Expert Contributor

I had check the log of namenode .

Because I change the host fof mabari metric collector,So the log shows post error to the old ambari metric collector host.

Super Mentor

@darkz yu

Have you following the steps mentioned in the following doc to properly move the AMS to a new host?

https://cwiki.apache.org/confluence/display/AMBARI/Moving+Metrics+Collector+to+a+new+host

Following points are important if you want to see OLD metrics data after moving the AMS to new host:

If the AMS is in embedded mode, copy the AMS data from old node to new node.

  • For embedded mode (ams-site: timeline.metrics.service.operation.mode), copy over the hbase.rootdir and tmpdir to new host from the old collector host.
  • For distributed mode, since AMS HBase is writing to HDFS, no change will be necessary.
  • Ensure that ams:hbase-site:hbase.rootdir and hbase.tmp.dir are pointing to the correct location in the new AMS node

- The service daemons will be pointing to the old metrics collector host. Perform a rolling restart of slave components and a normal restart of Master components for them to pick up the new collector host.

.