Support Questions

Find answers, ask questions, and share your expertise

Ambari Stale Alert

avatar
Explorer

Hi Everybody , I need your help. This alarm appear in my cluster,

There are 1 stale alerts from 1 host(s): node20.com[DataNode Unmounted Data Dir (4m), DataNode Process (4m), DataNode Web UI (4m), DataNode Storage (4m), NodeManager Web UI (4m), NodeManager Health (4m), Host Disk Usage (4m)]

I have a open case with Horton support but have many days with this issue. Somebody that have some information about this alarm?

Regards,

1 ACCEPTED SOLUTION

avatar
Explorer

@Felix Albani

I have only one node with this situation. Let me to do this change that you recommend.

But what is the relation that change this parameter with the Stale Alarm?

Thanks for your time.

Regards from México !

View solution in original post

8 REPLIES 8

avatar

@Luis Picazo

First identify the Ambari Agent nodes which have got this problem.

Then from Ambari 2.2 try increasing alter_grace_period from default of 5 seconds to 10. Can be modified in /etc/ambari-agent/conf/ambari-agent.ini

Previous 2.2 see https://community.hortonworks.com/questions/9762/how-to-get-rid-of-stale-alerts-in-ambari.html

avatar
Explorer

@Felix Albani

I have only one node with this situation. Let me to do this change that you recommend.

But what is the relation that change this parameter with the Stale Alarm?

Thanks for your time.

Regards from México !

avatar
Explorer

I have changed the parameter on the node but has the same result 😞

5478-quvwh.png

avatar

@Luis Picazo When there are large number of components, the alert checks do not space out and would fail more often. Did you restarted the ambari-agent?

avatar
Explorer

@Felix Albani

Yes , I restarted the ambari-agent.

The space on this node is :

[root@~]# df -h

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_mixsfwdebda20-lvroot 35G 4.5G 29G 14% / tmpfs 64G 8.0K 64G 1% /dev/shm /dev/sda1 976M 33M 893M 4% /boot /dev/mapper/vg_grid-lvgrid 28T 7.8T 20T 29% /grid /dev/mapper/vg_mixsfwdebda20-lvvar 168G 8.0G 151G 5% /var VNX:/MUM 148G 43G 106G 29% /VFS_MUM mixsfwdebda05:/grid/nfs/mixsfwdebda20 28T 1.2T 27T 5% /nfs/mixsfwdebda20

[root@ ~]# su - hdfs

[hdfs@ ~]$ hdfs dfs -df -h

Filesystem Size Used Available Use% hdfs://mktClusterProd:8020 440.0 T 93.2 T 332.3 T 21%

Thanks for your help

avatar
Expert Contributor

Please sync up your time between the nodes.

Since we went through this, can you mark this as answered?

Thanks

Italo

avatar
Explorer

Thanks Italo , Now I don't have the Stale Alerts.

Regards.

avatar
New Contributor

@felix albani : i followed your suggestions and it worked for me . but i would appreciate if you can also brief me how it resolved that problem ?