If you are noticing that the amabri metrics service is going down then in that case please provide us the following information's which was requested in the other thread
Initially we can start with tuning the AMS collector and HMaster Heap size a bit. We need to make sure that we have enough free memory available on the host.
# free -g
Once we have the enough information then we can find out what might be causing the AMS to go down.
For More accurate Tuning Recommendations (data needed)
For more accurate tuning of AMS service you may need to collect some additional details like the output fo the following API calls and the current memory usage:
1. The JSON response from the following API calls to see the uni queue metrics and hosts, to understand the load on AMS.
2. Current memory usages (either complete GC logs HMaster /var/log/ambari-metrics-collector/gc.log-xxxxxx and for AMS collector GC log : /var/log/ambari-metrics-collector/collector-gc.logxxxxxxxx )
3. The output of the AMS HMaster jmx call: AMS HBase jmx Snapshot -
4. The AMS HMaster UI output to understand the "Region Count" and "StoreFile Count"
5. AMS collector and HMaster GC complete logs.
6. AMS Configurations:
# tar czhfv /tmp/ams_collector_etc_$(hostname)_$(date +"%Y%m%d%H%M%S").tar.gz /etc/ambari-metrics-collector/ # tar czhfv /tmp/ams_hmaster_etc_$(hostname)_$(date +"%Y%m%d%H%M%S").tar.gz /etc/ams-hbase/
7. The Log files of AMS collector and AMS Hbase master.
For AMS /var/log/ambari-metrics-collector/hbase-ams-master-xxxxx.log /var/log/ambari-metrics-collector/ambari-metrics-collector.log For Metrics Monitor /var/log/ambari-metrics-monitor/ambari-metrics-monitor.log