Created 07-04-2016 04:04 PM
Hi Everybody , I need your help. This alarm appear in my cluster,
There are 1 stale alerts from 1 host(s): node20.com[DataNode Unmounted Data Dir (4m), DataNode Process (4m), DataNode Web UI (4m), DataNode Storage (4m), NodeManager Web UI (4m), NodeManager Health (4m), Host Disk Usage (4m)]
I have a open case with Horton support but have many days with this issue. Somebody that have some information about this alarm?
Regards,
Created 07-04-2016 05:42 PM
I have only one node with this situation. Let me to do this change that you recommend.
But what is the relation that change this parameter with the Stale Alarm?
Thanks for your time.
Regards from México !
Created 07-04-2016 04:53 PM
First identify the Ambari Agent nodes which have got this problem.
Then from Ambari 2.2 try increasing alter_grace_period from default of 5 seconds to 10. Can be modified in /etc/ambari-agent/conf/ambari-agent.ini
Previous 2.2 see https://community.hortonworks.com/questions/9762/how-to-get-rid-of-stale-alerts-in-ambari.html
Created 07-04-2016 05:42 PM
I have only one node with this situation. Let me to do this change that you recommend.
But what is the relation that change this parameter with the Stale Alarm?
Thanks for your time.
Regards from México !
Created on 07-04-2016 06:05 PM - edited 08-19-2019 04:17 AM
I have changed the parameter on the node but has the same result 😞
Created 07-04-2016 06:10 PM
@Luis Picazo When there are large number of components, the alert checks do not space out and would fail more often. Did you restarted the ambari-agent?
Created 07-04-2016 09:41 PM
Yes , I restarted the ambari-agent.
The space on this node is :
[root@~]# df -h
Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_mixsfwdebda20-lvroot 35G 4.5G 29G 14% / tmpfs 64G 8.0K 64G 1% /dev/shm /dev/sda1 976M 33M 893M 4% /boot /dev/mapper/vg_grid-lvgrid 28T 7.8T 20T 29% /grid /dev/mapper/vg_mixsfwdebda20-lvvar 168G 8.0G 151G 5% /var VNX:/MUM 148G 43G 106G 29% /VFS_MUM mixsfwdebda05:/grid/nfs/mixsfwdebda20 28T 1.2T 27T 5% /nfs/mixsfwdebda20
[root@ ~]# su - hdfs
[hdfs@ ~]$ hdfs dfs -df -h
Filesystem Size Used Available Use% hdfs://mktClusterProd:8020 440.0 T 93.2 T 332.3 T 21%
Thanks for your help
Created 07-28-2016 03:13 PM
Please sync up your time between the nodes.
Since we went through this, can you mark this as answered?
Thanks
Italo
Created 07-28-2016 04:03 PM
Thanks Italo , Now I don't have the Stale Alerts.
Regards.
Created 05-05-2017 10:45 AM
@felix albani : i followed your suggestions and it worked for me . but i would appreciate if you can also brief me how it resolved that problem ?