Created on 04-10-2018 07:53 PM - edited 08-17-2019 08:15 PM
we perform service check to Ambari Metrics
and we get the following errors - "All metrics collectors are unavailable"
what we can do regarding that , in order to solve the problem?
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 207, in <module> AMSServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute method(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/service_check.py", line 159, in service_check raise Fail("All metrics collectors are unavailable.") resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
Created 04-11-2018 09:01 AM
Few checks will be good to perform:
1. To verify is ambari is showing false Running AMS services? Please check the AMS collector hosts if the AMS collector is actually running and listening to the correct address/port:
# netstat -tnlpa | grep 6188 # hostname -f
2. Please verify if the AMS process PID are matching with the PIDs that are listed int he following files. Some times not matching PIDs causes false info in the UI
$ ps -ef | grep ^ams | grep ApplicationHistoryServer $ cat /var/run/ambari-metrics-collector/ambari-metrics-collector.pid $ ps -ef | grep ^ams | grep HMaster
$ cat /var/run/ambari-metrics-collector/hbase-ams-master.pid
.
3. Can you try restarting the AMS collector Service once to see if you notice any error int he AMS collector logs?
.
Created 04-11-2018 09:01 AM
Few checks will be good to perform:
1. To verify is ambari is showing false Running AMS services? Please check the AMS collector hosts if the AMS collector is actually running and listening to the correct address/port:
# netstat -tnlpa | grep 6188 # hostname -f
2. Please verify if the AMS process PID are matching with the PIDs that are listed int he following files. Some times not matching PIDs causes false info in the UI
$ ps -ef | grep ^ams | grep ApplicationHistoryServer $ cat /var/run/ambari-metrics-collector/ambari-metrics-collector.pid $ ps -ef | grep ^ams | grep HMaster
$ cat /var/run/ambari-metrics-collector/hbase-ams-master.pid
.
3. Can you try restarting the AMS collector Service once to see if you notice any error int he AMS collector logs?
.
Created 04-11-2018 09:04 AM
Also please share the output of the following command from AMS collector host and few of the cluster nodes to verify if the AMS service version is same as Ambari Binary version or not?
# rpm -qa | grep ambari
.
Created 04-11-2018 09:11 AM
rpm -qa | grep ambari ambari-metrics-monitor-2.6.1.0-143.x86_64 ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64 ambari-agent-2.6.1.0-143.x86_64
Created 04-11-2018 09:13 AM
abouit "restarting the AMS" , when we restart all metrics service it will also restart the AMS?
Created 04-11-2018 10:05 AM
Created 04-11-2018 09:17 AM
@Jay after we set the corect date on all machines , now ambari metrics service check is ok , do you think this is logical ?
Created 04-11-2018 09:55 AM
Yes, the AMS basically posts a dummy metrics to the collector with start & End time as mentioned int eh script (which is a relative time) so time difference can be a valid reason for failing service checks.
get_metrics_parameters = { "metricNames": "AMBARI_METRICS.SmokeTest.FakeMetric", "appId": "amssmoketestfake", "hostname": params.hostname, "startTime": current_time - 60000, "endTime": current_time + 61000, "precision": "seconds", "grouped": "false", }
And