Created on 01-18-2016 08:03 PM - edited 09-16-2022 02:58 AM
Hi all,
Last night I got many of the following Ambari critical alerts:
There are {x} stale alerts from {n} host(s): {components list}
where {x}, {n} and {components list} were not always the same. For example:
There are 20 stale alerts from 1 host(s): NameNode Web UI, Metrics Monitor Status, WebHCat Server Status, NameNode High Availability Health, HST Server Process, NameNode Last Checkpoint, Flume Agent Status, Oozie Server Status, ZooKeeper Failover Controller Process, HBase Master Process, ResourceManager Web UI, HDFS Upgrade Finalized State, Ambari Agent Disk Usage, NameNode Directory Status, DataNode Health Summary, Oozie Server Web UI, DRPC Server Process, NodeManager Health Summary, RegionServers Health Summary, HiveServer2 Process
After 6 minutes, Ambari sent an OK alerts:
All alerts have run within their time intervals.
These messages repeated over and over again (13 critical, then 13 OK in 5 hours). This is the first time I see so many alerts from our cluster in one single night and all the services are fine from Ambari this morning. No more alerts either.
Does anybody have any insight what might cause this?
Thank you very much in advance!
Xi Sanderson
Created 02-26-2016 01:22 PM
Hi all,
I opened a support ticket and got answer back regarding metastore alerts. It is a known bug in the Ambari release I have (2.1.2):
https://issues.apache.org/jira/browse/AMBARI-14424
The suggested solution is to change script:
/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py
search for 30 and replace with 120, then restart Ambari server.
Still yet to monitor how the changes work.
Thank for all the helps from you guys!
Xi
Created 02-25-2016 04:50 PM
definitely open a support ticket and use smartsense to collect logs. Take a look in your /var/log/hive for metastore specific logs and paste errors from there here. Maybe we can help.
Created 02-26-2016 01:22 PM
Hi all,
I opened a support ticket and got answer back regarding metastore alerts. It is a known bug in the Ambari release I have (2.1.2):
https://issues.apache.org/jira/browse/AMBARI-14424
The suggested solution is to change script:
/var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/alerts/alert_hive_metastore.py
search for 30 and replace with 120, then restart Ambari server.
Still yet to monitor how the changes work.
Thank for all the helps from you guys!
Xi
Created 02-27-2016 04:18 PM
Thanks for sharing this useful information. How can I download the patch from jira and install it rather than running manually.
This is the first time am installing the ambari patch 😞
Created 02-27-2016 05:03 PM
Only apply patches if necessary and instructed by support. In case you don't have a support contract, here's Pivotal instructions to patch Ambari, we don't provide steps due to the reasons above. http://hawq.docs.pivotal.io/docs-hawq/topics/hdp-prerequisites.html
Needless to say its at your own risk.