About jarnold

jarnold · ‎10-10-2018

Sorry to come to this party so late, but the script as presented at https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html doesn't work on CentOS7 + Python 2.7 + Ambari 2.6.2.2. I can write a mean bash script, but I'm not a Python coder. In spite of my deficiencies, I got things working. As Dmitro implies, the script by default tries to asses utilization of all mounts, not just mounted block devices - and when you're looking at shared memory or proc objects and similar, that quickly becomes problematic. The solution posted here - a custom list of mount points - works, but isn't flexible. Without extensive rewriting of the script, it would be better to just strip out things like '/sys', '/proc', '/dev', and '/run'. We also need to strip out net_prio and cpuacct. So, with the understanding that there's almost certainly a better way to do this, I changed: print "mountPoints = " + mountPoints mountPointsList = mountPoints.split(",") print mountPointsList for l in mountPointsList: to: print "mountPoints = " + mountPoints mountPointsList = mountPoints.split(",") mountPointsList = [ x for x in mountPointsList if not x.startswith('net_pri')] mountPointsList = [ x for x in mountPointsList if not x.startswith('cpuacc')] mountPointsList = [ x for x in mountPointsList if not x.startswith('/sys')] mountPointsList = [ x for x in mountPointsList if not x.startswith('/proc')] mountPointsList = [ x for x in mountPointsList if not x.startswith('/run')] mountPointsList = [ x for x in mountPointsList if not x.startswith('/dev')] print mountPointsList for l in mountPointsList: And it works. It's perhaps also worth noting that to get the script to run from the command line, you'll need to link several library directory structures, similar to: ln -s /usr/lib/ambari-server/lib/resource_management /usr/lib/python2.7/site-packages/ ln -s /usr/lib/ambari-server/lib/ambari_commons /usr/lib/python2.7/site-packages/ ln -s /usr/lib/ambari-server/lib/ambari_simplejson /usr/lib/python2.7/site-packages/ After that, you can do like so: # python test_alert_disk_space.py mountPoints = ,/sys,/proc,/dev,/sys/kernel/security,/dev/shm,/dev/pts,/run,/sys/fs/cgroup,/sys/fs/cgroup/systemd,/sys/fs/pstore,/sys/fs/cgroup/cpu,cpuacct,/sys/fs/cgroup/net_cls,net_prio,/sys/fs/cgroup/hugetlb,/sys/fs/cgroup/blkio,/sys/fs/cgroup/devices,/sys/fs/cgroup/perf_event,/sys/fs/cgroup/freezer,/sys/fs/cgroup/cpuset,/sys/fs/cgroup/memory,/sys/fs/cgroup/pids,/sys/kernel/config,/,/sys/fs/selinux,/proc/sys/fs/binfmt_misc,/dev/mqueue,/sys/kernel/debug,/dev/hugepages,/data,/boot,/proc/sys/fs/binfmt_misc,/run/user/1000,/run/user/0 ['', '/', '/data', '/boot'] ---------- l : FINAL finalResultCode CODE ..... ---------- l : / / disk_usage.total 93365735424 =>OK FINAL finalResultCode CODE .....OK ---------- l : /data /data disk_usage.total 1063256064 =>OK FINAL finalResultCode CODE .....OK ---------- l : /boot /boot disk_usage.total 1063256064 =>OK FINAL finalResultCode CODE .....OK

jarnold · ‎10-10-2018

The script presented here doesn't work on CentOS 7 + Python 7 + Ambari 2.6.2.2. See https://community.hortonworks.com/questions/160781/custom-ambari-alerts-error-custom-ambari-alerts.html for additional discussion.

jarnold · ‎09-26-2018

There's a typo in the Step 1 command line - you need a space between the closing single quote and the Ambari URL. Also please be advised that I just encountered this issue in both Ambari 2.6.0.0 and 2.6.2.2. YMMV.

jarnold · ‎08-27-2018

The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site". dfs.namenode.rpc-bind-host=0.0.0.0 dfs.namenode.servicerpc-bind-host=0.0.0.0 dfs.namenode.http-bind-host=0.0.0.0 dfs.namenode.https-bind-host=0.0.0.0 Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format. I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment: 01) Note the Active NameNode. 02) In Ambari, stop ALL services except for ZooKeeper. 03) In Ambari, make the indicated changes to HDFS. 04) Get to the command line on the Active NameNode (see Step 1 above). 05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK` 06) Start the JournalNodes. 07) Start the zKFCs. 08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above). 09) Start the DataNodes. 10) Restart / Refresh any remaining HDFS components which have stale configs. 11) Start the remaining cluster services. It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).

jarnold · ‎08-02-2018

Bless you!!

jarnold · ‎05-18-2018

In my experience, if you remove the indicated flags, you still get audit logging - but those logs never get purged. Perhaps it would be better to leave the flags, but to change "INFO" to "OFF", rendering something like: -Dhdfs.audit.logger=OFF,DRFAAUDIT" ?

jarnold · ‎05-26-2017

And that said, I actually restarted Ambari as well - so I can't say for certain that the agent restart was sufficient; it may well have been the agents plus Ambari which did the trick.

jarnold · ‎05-26-2017

I ran into this same issue, but unlike the original poster, restarting the Ambari agents on the data nodes was sufficient to clear the alarm.

jarnold · ‎04-17-2017

The only trick here is that if the failed namenode is offline (which it is, because, well, it's failed), the first 3 commands in the answer will fail because the hdfs shell can't talk to the failed namenode. My workaround was: Edit /etc/hosts on the working namenode to add the failed namenode hostname on the same line which defines the working node. E.g., 192.168.1.27 workingnode.domain.com workingnode => 192.168.27 workingnode.domain.com workingnode failednode.domain.com failednode Issue the first 3 commands listed in the answer. Undo the changes to /etc/hosts. Issue the 4th and 5th commands listed in the answer. Is there a better way? Is there a way to force the working active namenode into safe mode even if the secondary is offline?

jarnold · ‎04-16-2017

Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?

Online	Offline
Last Visited	‎04-07-2020 03:26 PM

Member Since	‎01-20-2017 03:06 PM
Last Visited	‎04-07-2020 03:26 PM
Posts	17
Kudos received	1

Cloudera Community

Re: Custom ambari alerts error "test_alert_disk_sp...

Re: How to create and register custom ambari alert...

Re: Ambari UI shows "Move Master Wizard In Progres...

Re: Parameters for Multi-Homing

Re: How to test mysql JDBC connection through CLI

Re: Disable log4j logging for HDFS audit log

Re: Detected data dir(s) that became unmounted and...

Re: Detected data dir(s) that became unmounted and...

Re: Unable to restrat standby Namenode

Re: Fix Under-replicated blocks in HDFS manually