Support Questions
Find answers, ask questions, and share your expertise

TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Highlighted

TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Contributor

1. TimeLineServer is frequently going down. I restarted both MR2 and YARN. Installed version is 2.3.0.

2. Dash board metrics(HDFS usage, DataNodes Live,Namenode Heap,Name node uptime,namenode rpc etc) are showing as N/A in Prod enviornment. How can I see the metrics?

Please advise

9 REPLIES 9
Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Contributor

HDP installed version is 2.3.0

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

@kavitha velaga

Here it seems you have two issues

1) App Time Line server issue:--might be your App Time Line Server file got corrupted.

For seeing Metrics in Ambari:

1) Ambari Metrics --re-start it and see

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Contributor

Is it possible to fix the 1st issue( App TimeLine Server)? if so solution please

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Contributor

Unfortunately restarting Ambari Metrics Didn't help

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Here is the solution for App Time line server:

1) Go to App Time Line server

2) Backup the file under /opt/hadoop/yarn/timeline/

3) Once that is backed up we can go ahead and restart timeline server.

Note : Here you will lose the Resource Manager jobs history. But we are ok.

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Expert Contributor

What is the version of Ambari being used ?

The missing metric charts is AMS issue, please refer to the troubleshooting wiki below:

https://cwiki.apache.org/confluence/display/AMBARI/Troubleshooting+Guide

And the official docs:

http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_ambari_reference_guide/content/ch_amb_ref...

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Contributor

Do you know why AppTimelineServer is going down? Is it because of OOM (OutOfMemory)?

There was an issue where "yarn-env" configuration deployed by Ambari had an incorrect line for HDP 2.3 so the heap size for AppTimelineServer was always defaulting to 1GB even if you change the AppTimelineServer max heap size through Ambari.

Go to Services > YARN > Configs > Advanced > Advanced yarn-env and examine the content of yarn-env template. If the template has this entry:

export YARN_HISTORYSERVER_HEAPSIZE={{apptimelineserver_heapsize}}

The entry needs to be changed to:

export YARN_TIMELINESERVER_HEAPSIZE={{apptimelineserver_heapsize}}

Once you make the change, save the config.

This problem will be fixed in the upcoming Ambari 2.2.2 via https://issues.apache.org/jira/browse/AMBARI-14715.

Not sure if this is the specific issue you are running into.

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Expert Contributor

AMS: Can you look at the logs under /var/log/ambari-metrics-collector/ for hbase and collector.

Ignore zookeeper warnings in hbase logs that look like this:

zookeeper.
ClientCnxn: Session 0x154075e61350003 for server null, unexpected error, closing socket connection and attempting reconnect

First check the hbase*master*.log for relevant exceptions.

Highlighted

Re: TimeLine server is frequently going down and some of the metrics are N/A in Prod cluster

Expert Contributor

AMS: Can you look at the logs under /var/log/ambari-metrics-collector/ for hbase and collector.

Ignore zookeeper warnings in hbase logs that look like this:

zookeeper.
ClientCnxn: Session 0x154075e61350003 for server null, unexpected error, closing socket connection and attempting reconnect

First check the hbase*master*.log for relevant exceptions.