Support Questions

Find answers, ask questions, and share your expertise

frequently getting stale alerts for Datanodes.

avatar
Contributor

Hi everyone.

I am getting stale alerts every hr, if the last point of contact between Data nodes and Namenode is more than 30 s we get these alerts.

I am not able to find the root cause of this slowness, I have 32 cores system, but when this alert is generated in htop hdfs usage is more but not all cores are 100% utilized.

DataNode Health Summary

DataNode Health: [Live=5, Stale=1, Dead=0]
Please suggest changes required to resolve this.
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @KPG1 ,

 

The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

Hello @KPG1 ,

 

The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.