About avijayan

MichaelTomar · ‎12-15-2022

Try the Skivia app. It can sync Grafana and Hive Data without coding. Read more here.

Sadique1 · ‎02-27-2022

This solution is works for me on HDP 3.1.4 Ambari 2.7 Thanks for sharing.

chen-dian · ‎10-28-2021

大佬，我也遇到了这个问题，请问这个告警怎么消除呢？

siva26akkala · ‎09-10-2020

Its Worked for me .

rambabuch · ‎11-04-2019

You need to stop ambari metrics service via ambari and then remove all temp files. Go to Ambari Metrics collector service host. and execute the below command. mv /var/lib/ambari-metrics-collector /tmp/ambari-metrics-collector_OLD Now you can restart ams service again and now you should be good with Ambari Metrics.

avijayan · ‎10-26-2017

Understanding scale issues in AMS (Why does it hapen) The Metrics Collector component is the central daemon that receives metrics from ALL the service sinks and monitors that sends metrics. The collector uses HBase as its store and phoenix as the data accessor layer. In a high level, the metrics collector performs 2 operations related to scale in a continuos basis. Handle raw writes - A raw write is a bunch of metric data points received from services written onto HBase through phoenix. There is no read or aggregation involved. Periodically aggregate data - AMS aggregates data across cluster and across time. Cluster Aggregator - Computing the min,max,avg and sum of memory across all hosts is done by a cluster aggregator. This is called a 'TimelineClusterAggregatorSecond' which runs every 2 mins. In every run it reads the entire last 2 mins data and calculates aggregates and writes back. The read is expensive since it has to read non-aggregated data, while the write volume is smaller since it is aggregated data. For example, in a 100 node cluster, mem_free from 100 hosts becomes 1 aggregate metric value in this aggregator. Time Aggregator - Also called 'downsampling', this aggregator rolls up the data in the time dimension. This helps AMS TTL out smaller precision seconds data and hold aggregate data for a longer time. For example, if we have data point for every 10 seconds, the 5min time aggregator takes the 30 data points every 5 mins and creates 1 rolled up value. There are higher level downsamplers (1hour, 1day) as well, and they use their immediate predecessors data (1hr => 5mins, 1day => 1hr ). However, it is the 5min aggregator that is high compute since it reads the entire last 5 mins data and downsamples it. Again, the read is very expensive since it has to read non-aggregated data, while the write volume is smaller. This downsampler is called 'TimelineHostAggregatorMinute' Scale problems occur in AMS when one or both of the above operations cannot happen smoothly. The 'load' on AMS is decided based on following factors How many hosts in the cluster? How many metrics each component is sending to AMS? Either of the above can cause performance issues in AMS. How do we find out if AMS is experiencing scale problems? One or more of the following consequences can be seen on the cluster. Metrics Collector shuts down intermittently. Since Auto Restart is enabled for Metrics collector by default, this will up show as an alert stating 'Metrics collector has been auto restarted # times the last 1 hour'. Partial metrics data is seen. All non-aggregated host metrics are seen (HDFS Namenode metrics / Host summary page on Ambari / System - Servers Grafana dashboard). Aggregated data is not seen. (AMS Summary page / System - Home Grafana dashboard / HBase - Home Grafana dashboard). Step 1 : Get the current state of the system Fixing / Recovering from the problem. The above problems could occur because of a 2-3 underlying reasons. Underlying Problem What it could cause Fix / Workaround Too many metrics (#4 from above) It could cause ALL of the problems mentioned above. #1 : Trying out config changes First, we can try increasing memory of Metrics collector, HBase Master / RS based on mode. (Refer to memory configurations table at the top of the page) Configure AMS to read more data in a single phoenix fetch Set ams-site: timeline.metrics.service.resultset.fetchSize = 5000 (for <100 nodes) or 10000 (>100 nodes) Increase Hbase regionserver handler count. Set ams-hbase-site: hbase.regionserver.handler.count = 30 If Hive is sending a lot of metrics. Do not aggregate hive table metrics. Set ams-site:timeline.metrics.cluster.aggregation.sql.filters = sdisk_%,boottime,default.General% (Only From Ambari-2.5.0) #2 : Reducing number of metrics If the above config changes do not increase AMS stability, you can whitelist selected metrics or blacklist certain components' metrics that are causing the load issue. Whitelisting doc : Ambari Metrics - Whitelisting AMS node has slow disk speed. Disk is not able to keep up with high volume data. It can cause raw writes and aggregation problems. On larger clusters (>800 nodes) with distributed mode, suggest 3-5 SSDs on metrics collector node and create a config group for DN on that host to use those 3-5 disks as directories. ams-hbase-site :: hbase.rootdir - Change this path to a disk mount that is not heavily contended. ams-hbase-ste :: hbase.tmp.dir - Change this path to a location different from hbase.rootdir ams-hbase-ste :: hbase.wal.dir - Change this path to a location different from hbase.root.dir (From Ambari-2.5.1) Metric whitelisting will help in decreasing metric load. Known issues around HBase normalier and FIFO compaction. Documented in Known Issues (#11 and #13) This can be identified in #5 in the above table. Follow workaround steps in Known issue doc. Other Advanced Configurations Configuration Property Description Minimum Recommended values (Host Count => MB) ams-site phoenix.query.maxGlobalMemoryPercentage Percentage of total heap memory used by Phoenix threads in the Metrics Collector API/Aggregator daemon. 20 - 30, based on available memory. Default = 25. ams-site phoenix.spool.directory Set directory for Phoenix spill files. (Client side) Set this to different disk from hbase.rootdir dir if possible. ams-hbase-site phoenix.spool.directory Set directory for Phoenix spill files. (Server side) Set this to different disk from hbase.rootdir dir if possible. ams-hbase-site phoenix.query.spoolThresholdBytes Threshold size in bytes after which results from parallelly executed query results are spooled to disk. Set this to higher value based on available memory. Default is 12 mb.

avijayan · ‎10-11-2017

@darkz yu Yes, that sounds right. Sorry I missed the step which would have triggered the modification of the '/etc/hadoop/conf/hadoop-metrics2.properties' file. So, is your problem fixed now?

usha_vetrivel · ‎07-02-2018

Hello, I'm facing a similar issue with metrics not storing in HBase METRIC_RECORD table. I can however see the metrics being tracked since they are being returned by the /ws/v1/timeline/metrics/metadata endpoint. I have set: timeline.metrics.service.outofband.time.allowance.millis=600000 Just to give a background, I'm using StormTimelineMetricsSink to push custom topology metrics to the metrics collector & see the log statements to show that metrics are being emitted properly. Although the metric names show up in the dropdown of Grafana, there are no values to plot. I do see other other metric values showing up in the default graphs for AMS_HBASE, HOST etc. I did rm -rf hbase-tmp/ folders couple of times & started clean. I also verified there is plenty of space on the disk. Could you please help with identifying the missing connection to see the custom data pushed to the embedded HBASE?

marko_kole · ‎06-01-2017

@Aravindan Vijayan yes, Ive done that part. Its been working since the upgrade. Maybe the documentation should be updated so it is not confusing. I remember running into sth similar whne upgrading to 2.4 last year. Speaking of documentation, it should be more obvious with regards to grafana when upgrading from 2.5 to 2.6, since in 2.5 Grafana is already a part of the stack, but the upgrade docs consider it as a service that must be installed and not how to upgrade it (or if there is an upgrade).

mirohit80_ · ‎04-19-2018

How to delete the new created organizations.

Online	Offline
Last Visited	‎01-10-2019 01:23 PM

Member Since	‎09-28-2015 04:13 PM
Last Visited	‎01-10-2019 01:23 PM
Posts	95
Kudos received	51

Cloudera Community

Re: How to get disk info using Ambari API?

Re: Ambari metrics collector troubleshooting

Re: Upgrading Ambari to 2.5 - Unable to locate amb...

Re: Ambari Metrics Collector API not return latest...

Re: ambari metrics collector going down

Re: How does Grafana Hive Dashboard work?

Re: ambari metrics collector

Re: NameNode Heap Usage (Daily) Alert

Re: from psutil import _common ImportError: cannot...

Re: Problem with Ambari Metrics

Identifying and tackling scale problems in Ambari ...

Re: ambari alert NameNode Client RPC Processing La...

Re: Ambari metrics collector troubleshooting

Re: Upgrading Ambari to 2.5 - Unable to locate amb...

Re: Grafana Security