Support Questions
Find answers, ask questions, and share your expertise

Ambari Metrics Collector crashes every day

Highlighted

Ambari Metrics Collector crashes every day

Rising Star

It almost crashes every day, and throw the following error:

 Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD

I have cleaned hbase data following https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data , but crash still happens.

How to solve it?

2 REPLIES 2
Highlighted

Re: Ambari Metrics Collector crashes every day

Super Mentor

@Junfeng Chen

It might be due to AMS tuning Specially the Heap settings might be incorrect.

Can you please let us know:

1. What is the cluster size (number of nodes in the cluster)

2. What is the Mode of AMS service Embedded or Distributed ? We can find it by looking at the property "timeline.metrics.service.operation.mode"

3. What is the current Heap Settings for "hbase_master_heapsize" , "metrics_collector_heapsize", "hbase_regionserver_heapsize"

The following doc provides a very good tuning for heap settings based on number of nodes in the cluster can you please try that:

https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

.

Also if possible then can you please attach the output of the following API call which will help us in knowing how many metrics and what all metrics are being collected (just to findout if there are too many metrics).

http://<ams-host>:6188/ws/v1/timeline/metrics/metadata
http://<ams-host>:6188/ws/v1/timeline/metrics/hosts 

.

Re: Ambari Metrics Collector crashes every day

Rising Star

@Jay Kumar SenSharma

I have only 3 ndoes in cluster in embedded mode. hbase_master_heapsize=1152MB, metrics_collector_heapsize=1024MB, hbase_regionserver_heapsize=512MB

The cluster is in a no-internet access environment, and it is not allowed to send anything outside. But I have checked the data given by the api. hosts api give 6 hosts(3 hosts in upper case and other 3 hosts have the same name but in lower case) and one fake something. metadata show about 25000 lines data.