- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
frequently getting stale alerts for Datanodes.
- Labels:
-
Apache Ambari
-
HDFS
Created ‎07-17-2022 09:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone.
I am getting stale alerts every hr, if the last point of contact between Data nodes and Namenode is more than 30 s we get these alerts.
I am not able to find the root cause of this slowness, I have 32 cores system, but when this alert is generated in htop hdfs usage is more but not all cores are 100% utilized.
DataNode Health Summary
Created ‎07-18-2022 02:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @KPG1 ,
The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.
Created ‎07-18-2022 02:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @KPG1 ,
The time taken to mark a datanode as stale is give by dfs.namenode.stale.datanode.interval, with a default of 30 seconds. If this is happening with a specific Datanode, you can check if there is any network issues between the Datanode and the Namenode or if the Datanode has any JVM pauses reported by checking the Datanode logs. As a bandaid, you can bump up the above parameter till the underlying problem is solved.
