Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to identify stale datanode?

Solved Go to solution
Highlighted

How to identify stale datanode?

New Contributor

Datanode Health Summary in Ambari Alerts reported 1 stale node. How to identify which datannode is in stale state?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to identify stale datanode?

A datanode is considered stale when:

dfs.namenode.stale.datanode.interval < last contact < (2 * dfs.namenode.heartbeat.recheck-interval)

In the NameNode UI Datanodes tab, a stale datanode will stand out due to having a larger value for Last contact among live datanodes (also available in JMX output). When a datanode is stale, it will be given lowest priority for reads and writes.

Using default values, the namenode will consider a datanode stale when its heartbeat is absent for 30 seconds. After another 10 minutes without a heartbeat (10.5 minutes total), a datanode is considered dead.

Relevant properties include:

  • dfs.heartbeat.interval - default: 3 seconds
  • dfs.namenode.stale.datanode.interval - default: 30 seconds
  • dfs.namenode.heartbeat.recheck-interval - default: 5 minutes
  • dfs.namenode.avoid.read.stale.datanode - default: true
  • dfs.namenode.avoid.write.stale.datanode - default: true

This feature was introduced by HDFS-3703.

View solution in original post

8 REPLIES 8

Re: How to identify stale datanode?

Cloudera Employee

probably the namenode logs should say that..

Highlighted

Re: How to identify stale datanode?

@ayusuf@hortonworks.com

This is good explanation ...namenode will know about the stale DN

dfs.namenode.stale.datanode.interval

Default time interval for marking a datanode as "stale", i.e., if the namenode has not received heartbeat msg from a datanode for more than this time interval, the datanode will be marked and treated as "stale" by default. The stale interval cannot be too small since otherwise this may cause too frequent change of stale states. We thus set a minimum stale interval value (the default value is 3 times of heartbeat interval) and guarantee that the stale interval cannot be less than the minimum value. A stale data node is avoided during lease/block recovery. It can be conditionally avoided for reads (see dfs.namenode.avoid.read.stale.datanode) and for writes (see dfs.namenode.avoid.write.stale.datanode).
Highlighted

Re: How to identify stale datanode?

A datanode is considered stale when:

dfs.namenode.stale.datanode.interval < last contact < (2 * dfs.namenode.heartbeat.recheck-interval)

In the NameNode UI Datanodes tab, a stale datanode will stand out due to having a larger value for Last contact among live datanodes (also available in JMX output). When a datanode is stale, it will be given lowest priority for reads and writes.

Using default values, the namenode will consider a datanode stale when its heartbeat is absent for 30 seconds. After another 10 minutes without a heartbeat (10.5 minutes total), a datanode is considered dead.

Relevant properties include:

  • dfs.heartbeat.interval - default: 3 seconds
  • dfs.namenode.stale.datanode.interval - default: 30 seconds
  • dfs.namenode.heartbeat.recheck-interval - default: 5 minutes
  • dfs.namenode.avoid.read.stale.datanode - default: true
  • dfs.namenode.avoid.write.stale.datanode - default: true

This feature was introduced by HDFS-3703.

View solution in original post

Highlighted

Re: How to identify stale datanode?

Nicely explained! Thanks @Alex Miller

Highlighted

Re: How to identify stale datanode?

New Contributor

Thanks Alex. Very good explanation. :-( I learned this the hard way yesterday night, by bringing the network down (ifdown eth1) while the datanode was up in one of VM nodes and refreshing the Namenode UI -> Datanode tab. @Alex Miller

Highlighted

Re: How to identify stale datanode?

Thanks, hopefully it will save someone the hassle in the future.

In the future, please leave this as a comment rather than a separate answer.

Highlighted

Re: How to identify stale datanode?

New Contributor

I Agree. Sorry, using AH for the first time and accidentally clicked reply instead of comment :-(

Highlighted

Re: How to identify stale datanode?

No worries, we're all learning it as we go

Don't have an account?
Coming from Hortonworks? Activate your account here