Support Questions

Find answers, ask questions, and share your expertise

Check Ambari Metrics + service check

avatar

we perform service check to Ambari Metrics

and we get the following errors - "All metrics collectors are unavailable"

what we can do regarding that , in order to solve the problem?

68429-capture.png

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 207, in <module>
    AMSServiceCheck().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 159, in service_check
    raise Fail("All metrics collectors are unavailable.")
resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Few checks will be good to perform:

1. To verify is ambari is showing false Running AMS services? Please check the AMS collector hosts if the AMS collector is actually running and listening to the correct address/port:

# netstat -tnlpa | grep 6188
# hostname -f 

2. Please verify if the AMS process PID are matching with the PIDs that are listed int he following files. Some times not matching PIDs causes false info in the UI

$  ps -ef | grep ^ams | grep ApplicationHistoryServer

$  cat /var/run/ambari-metrics-collector/ambari-metrics-collector.pid  

$  ps -ef | grep ^ams | grep HMaster
$ cat /var/run/ambari-metrics-collector/hbase-ams-master.pid

.

3. Can you try restarting the AMS collector Service once to see if you notice any error int he AMS collector logs?

.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@Michael Bronson

Few checks will be good to perform:

1. To verify is ambari is showing false Running AMS services? Please check the AMS collector hosts if the AMS collector is actually running and listening to the correct address/port:

# netstat -tnlpa | grep 6188
# hostname -f 

2. Please verify if the AMS process PID are matching with the PIDs that are listed int he following files. Some times not matching PIDs causes false info in the UI

$  ps -ef | grep ^ams | grep ApplicationHistoryServer

$  cat /var/run/ambari-metrics-collector/ambari-metrics-collector.pid  

$  ps -ef | grep ^ams | grep HMaster
$ cat /var/run/ambari-metrics-collector/hbase-ams-master.pid

.

3. Can you try restarting the AMS collector Service once to see if you notice any error int he AMS collector logs?

.

avatar
Master Mentor

@Michael Bronson

Also please share the output of the following command from AMS collector host and few of the cluster nodes to verify if the AMS service version is same as Ambari Binary version or not?

# rpm -qa | grep ambari

.


avatar
rpm -qa | grep ambari
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
ambari-agent-2.6.1.0-143.x86_64
Michael-Bronson

avatar

abouit "restarting the AMS" , when we restart all metrics service it will also restart the AMS?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Yes, Ambari UI --> Ambari Metrics --> Service Actions --> Restart All Metrics service will cause Ambari Metrics Monitor/ Grafana & Metrics Collector restart.

avatar

@Jay after we set the corect date on all machines , now ambari metrics service check is ok , do you think this is logical ?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Yes, the AMS basically posts a dummy metrics to the collector with start & End time as mentioned int eh script (which is a relative time) so time difference can be a valid reason for failing service checks.

      get_metrics_parameters = {
        "metricNames": "AMBARI_METRICS.SmokeTest.FakeMetric",
        "appId": "amssmoketestfake",
        "hostname": params.hostname,
        "startTime": current_time - 60000,
        "endTime": current_time + 61000,
        "precision": "seconds",
        "grouped": "false",
      }

https://github.com/apache/ambari/blob/release-2.6.1/ambari-server/src/main/resources/common-services...

And

https://github.com/apache/ambari/blob/release-2.6.1/ambari-server/src/main/resources/common-services...