Created 07-04-2016 04:04 PM
Hi Everybody , I need your help. This alarm appear in my cluster,
There are 1 stale alerts from 1 host(s): node20.com[DataNode Unmounted Data Dir (4m), DataNode Process (4m), DataNode Web UI (4m), DataNode Storage (4m), NodeManager Web UI (4m), NodeManager Health (4m), Host Disk Usage (4m)]
I have a open case with Horton support but have many days with this issue. Somebody that have some information about this alarm?
Regards,
Created 07-04-2016 05:42 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 07-04-2016 04:53 PM
First identify the Ambari Agent nodes which have got this problem.
Then from Ambari 2.2 try increasing alter_grace_period from default of 5 seconds to 10. Can be modified in /etc/ambari-agent/conf/ambari-agent.ini
Previous 2.2 see https://community.hortonworks.com/questions/9762/how-to-get-rid-of-stale-alerts-in-ambari.html
Created 07-04-2016 05:42 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated on 07-04-2016 06:05 PM - edited 08-19-2019 04:17 AM
I have changed the parameter on the node but has the same result 😞
Created 07-04-2016 06:10 PM
@Luis Picazo When there are large number of components, the alert checks do not space out and would fail more often. Did you restarted the ambari-agent?
Created 07-04-2016 09:41 PM
Yes , I restarted the ambari-agent.
The space on this node is :
[root@~]# df -h
Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_mixsfwdebda20-lvroot 35G 4.5G 29G 14% / tmpfs 64G 8.0K 64G 1% /dev/shm /dev/sda1 976M 33M 893M 4% /boot /dev/mapper/vg_grid-lvgrid 28T 7.8T 20T 29% /grid /dev/mapper/vg_mixsfwdebda20-lvvar 168G 8.0G 151G 5% /var VNX:/MUM 148G 43G 106G 29% /VFS_MUM mixsfwdebda05:/grid/nfs/mixsfwdebda20 28T 1.2T 27T 5% /nfs/mixsfwdebda20
[root@ ~]# su - hdfs
[hdfs@ ~]$ hdfs dfs -df -h
Filesystem Size Used Available Use% hdfs://mktClusterProd:8020 440.0 T 93.2 T 332.3 T 21%
Thanks for your help
Created 07-28-2016 03:13 PM
Please sync up your time between the nodes.
Since we went through this, can you mark this as answered?
Thanks
Italo
Created 07-28-2016 04:03 PM
Thanks Italo , Now I don't have the Stale Alerts.
Regards.
Created 05-05-2017 10:45 AM
@felix albani : i followed your suggestions and it worked for me . but i would appreciate if you can also brief me how it resolved that problem ?