Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Modified Ambari Disk alert Threshold is not getting into Effect

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Super Collaborator

Can you show the results of the 2 GET commands? That would help us see the state of the system WRT stale alerts and the thresholds.

You can also take a look at what definition was sent down to the agent. On any given agent, run this:

grep "ambari_agent_disk_usage" /var/lib/ambari-agent/cache/alerts/definitions.json -A40 -B2
Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

@Jonathan Here is the output from one of the agent node, looks like the new value is populated but not reflecting on the Ambari console & still shows the old alert with 78%. Output of the other 2 GET commands are too large here to paste.

$ grep "ambari_agent_disk_usage" /var/lib/ambari-agent/cache/alerts/definitions.json -A40 -B2 { "ignore_host": false, "name": "ambari_agent_disk_usage", "componentName": "AMBARI_AGENT", "interval": 1, "clusterId": 2, "uuid": "03d1a5aa-d3ea-41c6-b52c-187da3128f74", "label": "Host Disk Usage", "definitionId": 43, "source": { "path": "alert_disk_space.py", "type": "SCRIPT", "parameters": [ { "display_name": "Minimum Free Space", "name": "minimum.free.space", "value": "5.0E9", "threshold": "WARNING", "units": "bytes", "type": "NUMERIC", "description": "The overall amount of free disk space left before an alert is triggered." }, { "display_name": "Warning", "name": "percent.used.space.warning.threshold", "value": "0.9", "threshold": "WARNING", "units": "%", "type": "PERCENT", "description": "The percent of disk space consumed before a warning is triggered." }, { "display_name": "Critical", "name": "percent.free.space.critical.threshold", "value": "0.95", "threshold": "CRITICAL", "units": "%", "type": "PERCENT", "description": "The percent of disk space consumed before a critical alert is triggered." } ] }, "serviceName": "AMBARI",

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

Sorry pasted like a Junk, how do you paste in a proper format?

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Super Collaborator

So that makes me think that the alert is not running. Can you check the agent log in /var/log/ambari-agent/ambari-agent.log and see if there's anything that indicates that the alert wasn't able to run?

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

@ Jonathan Yes i do see lot of warning messages with regards to cache update...

WARNING 2016-06-08 
17:47:35,880 FileCache.py:162 - Error occurred during cache update. 
Error tolerate setting is set to true, so ignoring this error and 
continuing with current cache. Error details: Can not download file from
 url 
http://EN:8080/resources//common-services/YARN/2.1.0.2.0/package/.hash :
 <urlopen error [Errno -2] Name or service not known>
WARNING 
2016-06-08 17:47:35,889 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url http://EN:8080/resources//host_scripts/.hash : <urlopen 
error [Errno -2] Name or service not known>
WARNING 2016-06-08 
17:47:35,890 FileCache.py:162 - Error occurred during cache update. 
Error tolerate setting is set to true, so ignoring this error and 
continuing with current cache. Error details: Can not download file from
 url http://EN:8080/resources//stacks/HDP/2.0.6/hooks/.hash : 
<urlopen error [Errno -2] Name or service not known>
WARNING 
2016-06-08 17:47:35,891 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url 
http://EN:8080/resources//common-services/YARN/2.1.0.2.0/package/.hash :
 <urlopen error [Errno -2] Name or service not known>
WARNING 
2016-06-08 17:47:35,902 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url http://EN:8080/resources//host_scripts/.hash : <urlopen 
error [Errno -2] Name or service not known>
WARNING 2016-06-08 17:47:35,783 FileCache.py:162 - Error occurred during cache update. Error tolerate setting is set to true, so ignoring this error and continuing with current cache. Error details: Can not download file from url http://EN:8080/resources//common-services/HDFS/2.1.0.2.0/package/.hash : <urlopen error [Errno -2] Name or service not known>
WARNING 
2016-06-08 17:47:35,904 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url 
http://EN:8080/resources//common-services/ZOOKEEPER/3.4.5.2.0/package/.hash
 : <urlopen error [Errno -2] Name or service not known>
WARNING
 2016-06-08 17:47:35,914 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url http://EN:8080/resources//host_scripts/.hash : <urlopen 
error [Errno -2] Name or service not known>
WARNING 2016-06-08 
17:47:35,915 FileCache.py:162 - Error occurred during cache update. 
Error tolerate setting is set to true, so ignoring this error and 
continuing with current cache. Error details: Can not download file from
 url http://EN:8080/resources//stacks/HDP/2.0.6/hooks/.hash : 
<urlopen error [Errno -2] Name or service not known>
WARNING 
2016-06-08 17:47:35,915 FileCache.py:162 - Error occurred during cache 
update. Error tolerate setting is set to true, so ignoring this error 
and continuing with current cache. Error details: Can not download file 
from url 
http://EN:8080/resources//common-services/ZOOKEEPER/3.4.5.2.0/package/.hash
 : <urlopen error [Errno -2] Name or service not known>
INFO 
2016-06-08 17:47:43,723 Heartbeat.py:78 - Building Heartbeat: 
{responseId = 8320, timestamp = 1465379263722, commandsInProgress = 
False, componentsMapped = True}
INFO 2016-06-08 17:47:43,733 Controller.py:268 - Heartbeat response received (id = 8321)
INFO
 2016-06-08 17:47:53,734 Heartbeat.py:78 - Building Heartbeat: 
{responseId = 8321, timestamp = 1465379273734, commandsInProgress = 
False, componentsMapped = True}
INFO 2016-06-08 17:47:53,736 Controller.py:268 - Heartbeat response received (id = 8322)
INFO
 2016-06-08 17:48:03,737 Heartbeat.py:78 - Building Heartbeat: 
{responseId = 8322, timestamp = 1465379283737, commandsInProgress = 
False, componentsMapped = True}
INFO 2016-06-08 17:48:03,892 Controller.py:268 - Heartbeat response received (id = 8323)
INFO
 2016-06-08 17:48:13,892 Heartbeat.py:78 - Building Heartbeat: 
{responseId = 8323, timestamp = 1465379293892, commandsInProgress = 
False, componentsMapped = True}
INFO 2016-06-08 17:48:13,895 Controller.py:268 - Heartbeat response received (id = 8324)
INFO
 2016-06-08 17:48:23,895 Heartbeat.py:78 - Building Heartbeat: 
{responseId = 8324, timestamp = 1465379303895, commandsInProgress = 
False, componentsMapped = True}
INFO 2016-06-08 17:48:23,898 Controller.py:268 - Heartbeat response received (id = 8325)
WARNING
 2016-06-08 17:48:24,685 base_alert.py:417 - 
[Alert][yarn_resourcemanager_webui] HA nameservice value is present but 
there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}
WARNING
 2016-06-08 17:48:24,694 base_alert.py:417 - [Alert][namenode_webui] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2016-06-08 
17:48:24,695 base_alert.py:417 - [Alert][datanode_health_summary] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
WARNING 2016-06-08 
17:48:24,698 base_alert.py:417 - [Alert][namenode_directory_status] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}
Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Super Collaborator

Those are OK ... what if you grep the log for "[Alert]": grep "[Alert]" /var/log/ambari-agent/ambari-agent.log.

And this is on the host that's still reporting the 78% warning? Disabling/Enabling the alert as you have indicated clears the alert data out. If it shows back up, it means that the agent is running it properly. However, the data on the agent indicates that the new definition was updated.

What if you restart ambari-agent? Does that schedule the new definition?

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

Last few lines by grepping "[Alert]" on the log file.. Restarting ambari agent / server did not help, also alert disable/enable.

INFO 2016-06-09 11:55:19,692 Controller.py:268 - Heartbeat response received (id = 14821) WARNING 2016-06-09 11:55:24,689 base_alert.py:417 - [Alert][yarn_resourcemanager_rpc_latency] HA nameservice valu is present but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2016-06-09 11:55:24,690 base_alert.py:417 - [Alert][yarn_resourcemanager_webui] HA nameservice value is p esent but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2016-06-09 11:55:24,697 base_alert.py:417 - [Alert][yarn_resourcemanager_cpu] HA nameservice value is pre ent but there are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}} WARNING 2016-06-09 11:55:24,706 base_alert.py:417 - [Alert][namenode_cpu] HA nameservice value is present but the e are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,707 base_alert.py:417 - [Alert][namenode_hdfs_blocks_health] HA nameservice value is resent but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,707 base_alert.py:417 - [Alert][namenode_rpc_latency] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,709 base_alert.py:417 - [Alert][namenode_webui] HA nameservice value is present but t ere are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,710 base_alert.py:417 - [Alert][datanode_health_summary] HA nameservice value is pres nt but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,712 base_alert.py:417 - [Alert][namenode_hdfs_pending_deletion_blocks] HA nameservice value is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,714 base_alert.py:417 - [Alert][namenode_hdfs_capacity_utilization] HA nameservice va ue is present but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} WARNING 2016-06-09 11:55:24,718 base_alert.py:417 - [Alert][namenode_directory_status] HA nameservice value is pr sent but there are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}} INFO 2016-06-09 11:55:29,693 Heartbeat.py:78 - Building Heartbeat: {responseId = 14821, timestamp = 1465444529693, commandsInProgress = False, componentsMapped = True} INFO 2016-06-09 11:55:29,698 Controller.py:268 - Heartbeat response received (id = 14822) INFO 2016-06-09 11:55:39,698 Heartbeat.py:78 - Building Heartbeat: {responseId = 14822, timestamp = 1465444539698, commandsInProgress = False, componentsMapped = True} INFO 2016-06-09 11:55:39,701 Controller.py:268 - Heartbeat response received (id = 14823)

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

@Muthukumar S

1. Is this problem for most /all of the alerts or only for specific alerts?

2. Can you try scheduling sample test alert for other service [say custom alert] and check if that works ?

For custom alerts pls check - https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.htm...

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor
Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Super Collaborator

So it seems like you're saying this:

- You have at least 1 host disk usage alert. This alert is stuck at a WARNING with 78%.

- You've tried disabling/enabling the alert. When you disable the alert, the warning disappears for about a minute or so. And then re-appears. This indicates that the agent is still successfully running the alert.

- You have verified that the /var/lib/ambari-agent/cache/alerts/definitions.json includes your changes. You've also restarted the agent to ensure that the new alert was picked up.

At this point, I think we need to just take a look at your logs. Can you upload the following files from the host reporting the alert at 78%?

- /var/lib/ambari-agent/cache/alerts/definitions.json

- /var/log/ambari-agent/ambari-agent.log

- /var/log/ambari-agent/ambari-alerts.log

This should tell us what's going on.

Don't have an account?
Coming from Hortonworks? Activate your account here