Support Questions

KPG1 · ‎07-17-2022

Hi everyone.

I am getting stale alerts every hr, if the last point of contact between Data nodes and Namenode is more than 30 s we get these alerts.

I am not able to find the root cause of this slowness, I have 32 cores system, but when this alert is generated in htop hdfs usage is more but not all cores are 100% utilized.

DataNode Health Summary

DataNode Health: [Live=5, Stale=1, Dead=0]

Please suggest changes required to resolve this.

rki_ · ‎07-18-2022

Hello @KPG1 ,

The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.

View solution in original post

rki_ · ‎07-18-2022

Hello @KPG1 ,

The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.

Cloudera Community

Support Questions

frequently getting stale alerts for Datanodes.