Support Questions
Find answers, ask questions, and share your expertise

metrics monitor

Super Collaborator

hi:

I have the metrics running, but just i can see the informacion about one node, the last one, the are not alert, any suggestion???

7150-snip20160831-1.png

25 REPLIES 25

@Roberto Sancho

Have you verified the Metrics Collector is running?

Have you tried restarting the Metrics Monitor on each of the nodes that are not reporting?

sudo -u ams '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf stop'
sudo -u ams '/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf start'

You can find the logs for the Metrics Monitor here:

/var/log/ambari-metrics-monitor/ambari-metrics-monitor.out

Here is more information on Ambari Metrics:

https://cwiki.apache.org/confluence/display/AMBARI/Metrics

Super Collaborator

Hi:

Everything is working but when i run service metric check it run i another machine when i have metric monitor

7165-snip20160831-3.png

so, its like the ambari think is in the last machine when i had itm but i move to another machine, so is a good idea check de ambari database and see if there is this service there???

@Roberto Sancho

I'm not sure that I understand. You moved the Metrics Collector to another server? And Ambari seems to think the collector is running on the wrong server? When you look at each host in the cluster, the Metrics Collector should only exist on a single host. However, the Metrics Monitor should exit on all hosts from which you want to collect metrics.

It is possible you may have corrupted metrics data. You can follow the instructions at this link to work through solving that if there is a data corruption issue:

https://community.hortonworks.com/articles/11805/how-to-solve-ambari-metrics-corrupted-data.html

Are you seeing any error messages in the Metrics Collector or Metrics Monitor logs?

Super Collaborator

the metrics collector need to be in the same machine that ambari server??? because i dont have like that

No, I don't believe that is a requirement.

Super Collaborator

Hi:

i have running the service metrics check to see if everything is working, buy i received tihs error

7489-captura.png

2016-09-08 12:55:54,436 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-09-08 12:55:54,566 - Ambari Metrics service check was started.
2016-09-08 12:55:54,604 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "xxxxxxxxxxxxxx",
      "timestamp": 1473332154000,
      "starttime": 1473332154000,
      "metrics": {
        "1473332154000": 0.136171923634,
        "1473332155000": 1473332154000
      }
    }
  ]
}
2016-09-08 12:55:54,604 - Connecting (POST) to xxxxxxx:6188/ws/v1/timeline/metrics/
2016-09-08 12:55:54,607 - Http response: 200 OK
2016-09-08 12:55:54,607 - Http data: {"errors":[]}
2016-09-08 12:55:54,608 - Metrics were saved.
2016-09-08 12:55:54,609 - Connecting (GET) to xxxxxxx:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=xxxxxxxxr&precision=seconds&grouped=false&startTime=1473332094000&appId=amssmoketestfake&endTime=1473332215000
2016-09-08 12:55:54,616 - Http response: 200 OK
2016-09-08 12:55:54,617 - Http data: {"metrics":[]}
2016-09-08 12:55:54,617 - Metrics were retrieved.
2016-09-08 12:55:54,617 - Values 0.136171923634 and 1473332154000 were not found in the response.

please any suggestion???


captura.png

Super Collaborator

hi:

there are any error:

2016-08-31 21:34:43,978 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics
2016-08-31 21:35:43,987 [INFO] emitter.py:91 - server: http://xxxxxxx:6188/ws/v1/timeline/metrics
21:38:03,175  INFO [Thread-28] TimelineClusterAggregatorSecond:83 - Saving 2252 metric aggregates.
21:38:03,320  INFO [Thread-28] TimelineClusterAggregatorSecond:258 - End aggregation cycle @ Wed Aug 31 21:38:03 CEST 2016
21:38:03,320  INFO [Thread-28] TimelineClusterAggregatorSecond:287 - End aggregation cycle @ Wed Aug 31 21:38:03 CEST 2016
21:40:03,092  INFO [Thread-28] TimelineClusterAggregatorSecond:124 - Last check point time: 1472672181320, lagBy: 221 seconds.
21:40:03,092  INFO [Thread-28] TimelineClusterAggregatorSecond:232 - Start aggregation cycle @ Wed Aug 31 21:40:03 CEST 2016, startTime = Wed Aug 31 21:36:21 CEST 2016, endTime = Wed Aug 31 21:38:21 CEST 2016
21:40:03,116  INFO [Thread-28] TimelineClusterAggregatorSecond:83 - Saving 2331 metric aggregates.
21:40:03,269  INFO [Thread-28] TimelineClusterAggregatorSecond:258 - End aggregation cycle @ Wed Aug 31 21:40:03 CEST 2016
21:40:03,269  INFO [Thread-28] TimelineClusterAggregatorSecond:287 - End aggregation cycle @ Wed Aug 31 21:40:03 CEST 2016

Super Collaborator

Hi:

I restarted the service in one server and the log look fine:

2016-09-08 12:47:08,786 [INFO] host_info.py:291 - hostname_script: None
2016-09-08 12:47:08,803 [INFO] host_info.py:303 - Cached hostname: xxxxxxxxx
2016-09-08 12:47:08,804 [INFO] controller.py:102 - Adding event to cache, all : {u'metrics': [{u'value_threshold': u'128', u'name': u'bytes_out'}], u'collect_every': u'10'}
2016-09-08 12:47:08,804 [INFO] controller.py:110 - Adding event to cache,  : {u'metrics': [], u'collect_every': u'15'}
2016-09-08 12:47:08,805 [INFO] main.py:65 - Starting Server RPC Thread: /usr/lib/python2.6/site-packages/resource_monitoring/main.py start
2016-09-08 12:47:08,805 [INFO] controller.py:57 - Running Controller thread: Thread-1
2016-09-08 12:47:08,806 [INFO] emitter.py:45 - Running Emitter thread: Thread-2
2016-09-08 12:47:08,806 [INFO] emitter.py:65 - Nothing to emit, resume waiting.
2016-09-08 12:48:08,815 [INFO] emitter.py:91 - server: http://xxxxxx:6188/ws/v1/timeline/metrics

also it look all metrics are running, please any suggestion???

7487-captura.png

@Roberto Sancho

In the smoke test logs you provided, there is clearly something not correct in the configuration. Here is where the smoke test is generating sample data:

2016-09-08 12:55:54,604 - Generated metrics:
{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "xxxxxxxxxxxxxx",
      "timestamp": 1473332154000,
      "starttime": 1473332154000,
      "metrics": {
        "1473332154000": 0.136171923634,
        "1473332155000": 1473332154000
      }
    }
  ]
}

Looking after that part, you see that a response was received, but the data has not metrics and that is why it's complaining.

2016-09-08 12:55:54,616 - Http response: 200 OK
2016-09-08 12:55:54,617 - Http data: {"metrics":[]}
2016-09-08 12:55:54,617 - Metrics were retrieved.
2016-09-08 12:55:54,617 - Values 0.136171923634 and 1473332154000 were not found in the response.

The test expects that the response contains the smoke test data object that it created and passed in, but it's getting any empty response {"metrics" : []}. The post message shows no errors coming back in your log, but clearly there is no data coming back when smoke test requests the metrics. Have you verified the servername in the logs where the calls are happening are where you have AMS running?