Member since
01-20-2017
17
Posts
1
Kudos Received
0
Solutions
10-10-2018
06:20 PM
Sorry to come to this party so late, but the script as presented at
https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html doesn't work on CentOS7 + Python 2.7 + Ambari 2.6.2.2. I can write a mean bash script, but I'm not a Python coder. In spite of my deficiencies, I got things working.
As Dmitro implies, the script by default tries to asses utilization of all mounts, not just mounted block devices - and when you're looking at shared memory or proc objects and similar, that quickly becomes problematic. The solution posted here - a custom list of mount points - works, but isn't flexible. Without extensive rewriting of the script, it would be better to just strip out things like '/sys', '/proc', '/dev', and '/run'. We also need to strip out net_prio and cpuacct.
So, with the understanding that there's almost certainly a better way to do this, I changed: print "mountPoints = " + mountPoints
mountPointsList = mountPoints.split(",")
print mountPointsList
for l in mountPointsList: to: print "mountPoints = " + mountPoints
mountPointsList = mountPoints.split(",")
mountPointsList = [ x for x in mountPointsList if not x.startswith('net_pri')]
mountPointsList = [ x for x in mountPointsList if not x.startswith('cpuacc')]
mountPointsList = [ x for x in mountPointsList if not x.startswith('/sys')]
mountPointsList = [ x for x in mountPointsList if not x.startswith('/proc')]
mountPointsList = [ x for x in mountPointsList if not x.startswith('/run')]
mountPointsList = [ x for x in mountPointsList if not x.startswith('/dev')]
print mountPointsList
for l in mountPointsList: And it works. It's perhaps also worth noting that to get the script to run from the command line, you'll need to link several library directory structures, similar to: ln -s /usr/lib/ambari-server/lib/resource_management /usr/lib/python2.7/site-packages/
ln -s /usr/lib/ambari-server/lib/ambari_commons /usr/lib/python2.7/site-packages/
ln -s /usr/lib/ambari-server/lib/ambari_simplejson /usr/lib/python2.7/site-packages/ After that, you can do like so: # python test_alert_disk_space.py
mountPoints = ,/sys,/proc,/dev,/sys/kernel/security,/dev/shm,/dev/pts,/run,/sys/fs/cgroup,/sys/fs/cgroup/systemd,/sys/fs/pstore,/sys/fs/cgroup/cpu,cpuacct,/sys/fs/cgroup/net_cls,net_prio,/sys/fs/cgroup/hugetlb,/sys/fs/cgroup/blkio,/sys/fs/cgroup/devices,/sys/fs/cgroup/perf_event,/sys/fs/cgroup/freezer,/sys/fs/cgroup/cpuset,/sys/fs/cgroup/memory,/sys/fs/cgroup/pids,/sys/kernel/config,/,/sys/fs/selinux,/proc/sys/fs/binfmt_misc,/dev/mqueue,/sys/kernel/debug,/dev/hugepages,/data,/boot,/proc/sys/fs/binfmt_misc,/run/user/1000,/run/user/0
['', '/', '/data', '/boot']
---------- l :
FINAL finalResultCode CODE .....
---------- l : /
/
disk_usage.total
93365735424
=>OK
FINAL finalResultCode CODE .....OK
---------- l : /data
/data
disk_usage.total
1063256064
=>OK
FINAL finalResultCode CODE .....OK
---------- l : /boot
/boot
disk_usage.total
1063256064
=>OK
FINAL finalResultCode CODE .....OK
... View more
10-10-2018
03:32 PM
The script presented here doesn't work on CentOS 7 + Python 7 + Ambari 2.6.2.2. See https://community.hortonworks.com/questions/160781/custom-ambari-alerts-error-custom-ambari-alerts.html for additional discussion.
... View more
09-26-2018
08:15 PM
There's a typo in the Step 1 command line - you need a space between the closing single quote and the Ambari URL. Also please be advised that I just encountered this issue in both Ambari 2.6.0.0 and 2.6.2.2. YMMV.
... View more
08-27-2018
08:30 PM
The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site". dfs.namenode.rpc-bind-host=0.0.0.0 dfs.namenode.servicerpc-bind-host=0.0.0.0 dfs.namenode.http-bind-host=0.0.0.0 dfs.namenode.https-bind-host=0.0.0.0 Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format. I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment: 01) Note the Active NameNode. 02) In Ambari, stop ALL services except for ZooKeeper. 03) In Ambari, make the indicated changes to HDFS. 04) Get to the command line on the Active NameNode (see Step 1 above). 05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK` 06) Start the JournalNodes. 07) Start the zKFCs. 08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above). 09) Start the DataNodes. 10) Restart / Refresh any remaining HDFS components which have stale configs. 11) Start the remaining cluster services. It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).
... View more
08-02-2018
07:29 PM
Bless you!!
... View more
05-18-2018
04:01 PM
In my experience, if you remove the indicated flags, you still get audit logging - but those logs never get purged. Perhaps it would be better to leave the flags, but to change "INFO" to "OFF", rendering something like: -Dhdfs.audit.logger=OFF,DRFAAUDIT" ?
... View more
05-26-2017
01:04 PM
And that said, I actually restarted Ambari as well - so I can't say for certain that the agent restart was sufficient; it may well have been the agents plus Ambari which did the trick.
... View more
05-26-2017
12:42 PM
I ran into this same issue, but unlike the original poster, restarting the Ambari agents on the data nodes was sufficient to clear the alarm.
... View more
04-17-2017
02:23 PM
1 Kudo
The only trick here is that if the failed namenode is offline (which it is, because, well, it's failed), the first 3 commands in the answer will fail because the hdfs shell can't talk to the failed namenode. My workaround was: Edit /etc/hosts on the working namenode to add the failed namenode hostname on the same line which defines the working node. E.g.,
192.168.1.27 workingnode.domain.com workingnode
=>
192.168.27 workingnode.domain.com workingnode failednode.domain.com failednode
Issue the first 3 commands listed in the answer.
Undo the changes to /etc/hosts.
Issue the 4th and 5th commands listed in the answer.
Is there a better way? Is there a way to force the working active namenode into safe mode even if the secondary is offline?
... View more
04-16-2017
04:03 AM
Potentially silly question: When you set the rep count, do you count the "original" data block as well? For example, I have 3 data nodes and I want one block on each of those nodes (3 blocks total). Is that 2 replicas or 3?
... View more