Community Articles

kkanchu · ‎05-08-2018

As an extension to the article mentioned here we are using custom Ambari alerts to monitor the current state of the Journal Node edits health.

With the default monitoring that is present in the Ambari, we would not be alerted about the failure of edits that may happen in the one of the JN quorum. In typical HDFS HA env, there are three Journal node daemons that are deployed. If any one of the daemons fails to maintain the edits, then we are at risk of failovers and eventual cluster outage if another journal node hits similar issue as other journal node (Because, if quorum of edits are not maintained, then NN fails to be up). Hence, we need to have necessary alerting mechanism put in place for such failures. Journal Nodes may not get updated due to various reasons such as,

1. Disk getting full.
2. Corrupt Permissions.
3. Exhausted HDFS handlers in JN host, etc..

Attached are the artifacts, which contains,

1. alerts-test.json 
2. jn_edits_tracker.py

jn_edits_tracker.py have preconfigured values,

OK_CEIL = 9
WARN_FLOOR = 10
WARN_CEIL = 19
CRITICAL_FLOOR = 20

Which defines the corresponding time ranges in seconds for alerts to be triggered. This would alert in Ambari, if the "edits_inprogress" file is not updated for above configured time interval.

Steps to configure the alert

1. Copy the jn_edits_tracker.py to /var/lib/ambari-server/resources/host_scripts

2. Now restart the Ambari-Server.

3. Run the following command to list all the existing alerts:

curl -u admin:admin -i -H 'X-Requested-By:ambari' -X GET http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions

4. Install the custom alert using Curl command as following:

curl -u admin:admin -i -H 'X-Requested-By:ambari' -X POST -d @alerts-test.json  http://node1.example.com:8080/api/v1/clusters/ClusterDemo/alert_definitions

Attachments : jneditsarchive.zip

Cloudera Community

Community Articles

HDFS Journal Node edits health checker

Apache Ambari

Apache Hadoop