Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Modified Ambari Disk alert Threshold is not getting into Effect

Highlighted

Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

HDP@Muthukumar

How to make the disk alert threshold change to take effect? I have changed the disk threshold (description) via Ambari console and also through curl command on Ambari server i have changed the alert threshold (from 50 to 90% for warning and 80 to 95% for critical) and verified through curl GET option. Also changed the value in the file /var/lib/ambari-server/resources/host_scripts/alert_disk_space.py.. After that i have restarted the Ambari-server service and also tried restarting the Ambari agents and also tried disabling and enabling the host disk alert option. Now it is at 78% and the warning still exists and the new values are not getting into effect? Version is HDP 2.4. Any advice or steps which im missing? Appreciate your reply.

Also some stale alerts (in RED) remain and not getting removed. This is another issue.

Thank you.

25 REPLIES 25
Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Any changes that we make in the "/var/lib/ambari-server/resources/host_scripts/alert_disk_space.py" script will require the ambari-server to be restarted

so that ambari server will push those changes to the agent cache directoy

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

@Joy, thank you for the reply, as i have mentioned in the question i did restart the ambari-server service but still the alert remains for 78% for /usr/hdp

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Any changes that we make in the "/var/lib/ambari-server/resources/host_scripts/alert_disk_space.py" script will require the ambari-server to be restarted

so that ambari server will push those changes to the agent cache directory "/var/lib/ambari-agent/cache/host_scripts" fo all the hosts where the agents are installed.

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Explorer

Hey can you let me know where did you try to change the value , i mean the parameter. I was facing similar issue and when i tried changing the value of the parameter PERCENT_USED_WARNING_KEY=percent.used.space.warning.threshold , PERCENT_USED_CRITICAL_KEY = percent.free.space.critical.threshold

example: PERCENT_USED_WARNING_KEY = 10 PERCENT_USED_CRITICAL_KEY = 20 <-- in the alert_disk_space.py it worked. Can you try testing with it ?

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

@sganatra Thank you for the reply. Yes same place i have changed and restarted the ambari-server service and also agents, even tried disabling and enabling the alert. The alert still remains on the console for 78%. Here are the output from that file.

$ cat /var/lib/ambari-server/resources/host_scripts/alert_disk_space.py | egrep 'PERCENT_USED_WARNING_DEFAULT|PERCENT_USED_CRITICAL_DEFAULT' PERCENT_USED_WARNING_DEFAULT = 90 PERCENT_USED_CRITICAL_DEFAULT = 95

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Explorer

I see the you are setting that up in PERCENT_USED_WARNING_DEFAULT but i changed it in PERCENT_USED_WARNING_KEY .. Can you try that once?

Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Explorer

Did that worked? share your output so that it will be helpful to take next steps.

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Super Collaborator

The stale alerts are mostly from a host which was removed from the system but was still heartbeating at the time of removal. Depending on which version of Ambari you are running, it may tell you which alerts are stale. If it does, then you simple disable/enable them to clear out any stale ones.

If Ambari doesn't tell you which ones are stale, you can use this query (replacing clusterName for your cluster)

GET api/v1/clusters/<clusterName>/alerts?Alert/latest_timestamp%3C1465201080000&fields=Alert/label

This will get you all alerts which haven't run in the past day or so. You can them simple disable/enable them in the web UI to clear them out.

As for your other issue of the disk space alerts, changing the description isn't going to do anything. Depending on your Ambari version, they are either parameterized or are hard coded in the alert_disk_space.py file. It sounds to me like they are parameterized which means that changing the "default" values in the python file won't actually do anything. You need to edit the parameters on the alert definition itself.

Later version of Ambari expose these in the web client, but I'm guessing that you're on a version which does not. You can change them by doing this:

First, get the definition (this should show you the parameters being used). Then you can PUT it right back, updating the values in each threshold

GET api/v1/clusters/<clusterName>/alert_definitions?AlertDefinition/name=ambari_agent_disk_usage&fields=*

{
  "href": "http://localhost:8080/api/v1/clusters/c1/alert_definitions?AlertDefinition/name=ambari_agent_disk_usage&fields=*",
  "items": [
    {
      "href": "http://localhost:8080/api/v1/clusters/c1/alert_definitions/62",
      "AlertDefinition": {
        "cluster_name": "c1",
        "component_name": "AMBARI_AGENT",
        "description": "This host-level alert is triggered if the amount of disk space used goes above specific thresholds. The default threshold values are 50% for WARNING and 80% for CRITICAL.",
        "enabled": true,
        "help_url": null,
        "id": 62,
        "ignore_host": false,
        "interval": 1,
        "label": "Host Disk Usage",
        "name": "ambari_agent_disk_usage",
        "repeat_tolerance": 1,
        "repeat_tolerance_enabled": false,
        "scope": "HOST",
        "service_name": "AMBARI",
        "source": {
          "parameters": [
            {
              "name": "minimum.free.space",
              "display_name": "Minimum Free Space",
              "units": "bytes",
              "value": 5000000000,
              "description": "The overall amount of free disk space left before an alert is triggered.",
              "type": "NUMERIC",
              "threshold": "WARNING"
            },
            {
              "name": "percent.used.space.warning.threshold",
              "display_name": "Warning",
              "units": "%",
              "value": 50,
              "description": "The percent of disk space consumed before a warning is triggered.",
              "type": "PERCENT",
              "threshold": "WARNING"
            },
            {
              "name": "percent.free.space.critical.threshold",
              "display_name": "Critical",
              "units": "%",
              "value": 80,
              "description": "The percent of disk space consumed before a critical alert is triggered.",
              "type": "PERCENT",
              "threshold": "CRITICAL"
            }
          ],
          "path": "alert_disk_space.py",
          "type": "SCRIPT"
        }
      }
    }
  ]
}

Make sure you PUT to the correct ID of the alert_definition (from above)

PUT http://localhost:8080/api/v1/clusters/<clusterName>/alert_definitions/62
{
  "AlertDefinition": {
    "cluster_name": "c1",
    "component_name": "AMBARI_AGENT",
    "description": "This host-level alert is triggered if the amount of disk space used goes above specific thresholds. The default threshold values are 50% for WARNING and 80% for CRITICAL.",
    "enabled": true,
    "help_url": null,
    "id": 62,
    "ignore_host": false,
    "interval": 1,
    "label": "Host Disk Usage",
    "name": "ambari_agent_disk_usage",
    "repeat_tolerance": 1,
    "repeat_tolerance_enabled": false,
    "scope": "HOST",
    "service_name": "AMBARI",
    "source": {
      "parameters": [
        {
          "name": "minimum.free.space",
          "display_name": "Minimum Free Space",
          "units": "bytes",
          "value": 5000000000,
          "description": "The overall amount of free disk space left before an alert is triggered.",
          "type": "NUMERIC",
          "threshold": "WARNING"
        },
        {
          "name": "percent.used.space.warning.threshold",
          "display_name": "Warning",
          "units": "%",
          "value": 90,
          "description": "The percent of disk space consumed before a warning is triggered.",
          "type": "PERCENT",
          "threshold": "WARNING"
        },
        {
          "name": "percent.free.space.critical.threshold",
          "display_name": "Critical",
          "units": "%",
          "value": 95,
          "description": "The percent of disk space consumed before a critical alert is triggered.",
          "type": "PERCENT",
          "threshold": "CRITICAL"
        }
      ],
      "path": "alert_disk_space.py",
      "type": "SCRIPT"
    }
  }
}
Highlighted

Re: Modified Ambari Disk alert Threshold is not getting into Effect

Expert Contributor

@Jonathan Really appreciate you reply. I have done the steps you have been told (GET and PUT) and mentioned in my query itself. The steps i have done are same as you have mentioned for changing the threshold. The HDP version is the latest one 2.4. Reg Stale alerts it shows detained, zookeeper services etc but all are running and green and reporting. These are old alerts which are not getting removed even i do disable and enable the alerts.

Really very strange & surprising, hence posted here if something would turn up or some hidden issue somewhere or a bug.

Don't have an account?
Coming from Hortonworks? Activate your account here