Support Questions

Find answers, ask questions, and share your expertise

JMX metric is too late to update. I need a check from you.

avatar
Expert Contributor

Hello,

I am checking JMX metrics with a period of time to monitor the cluster health.

When I try to check my monitoring platform I saw that it is too late to update. The case is a dead datanode. I stop one of the datanode services on Ambari and expect to see below data to change from 0 to 1:

http://namenodeaddress:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState

{
    "name" : "Hadoop:service=NameNode,name=FSNamesystemState",
    "modelerType" : "org.apache.hadoop.hdfs.server.namenode.FSNamesystem",
    ...
    "NumDeadDataNodes" : 0,
    ...

...
  }

It was updated 6 minutes later. It is a very long time to take an action. However when I start the service again, it is updated from 1 to 0 as soon as service was started.

Can someone check it for me if this is the normal update time?

PS: I know Ambari is faster to detect. Probably it uses another method to detect dead nodes. I need to check this to continue parsing other metrics.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Sedat Kestepe

- The Namenode determines whether a datanode dead or alive by using heartbeats.

- Each DataNode sends a Heartbeat message to the NameNode every 3 seconds (default value).

- This heartbeat interval is controlled by the "dfs.heartbeat.interval" property defined in hdfs-site.xml file.

- If a datanode dies, namenode waits for almost 10 mins before removing it from live nodes.

- The time period for determining whether a datanode is dead is calculated as

dfs.namenode.heartbeat.recheck-interval + 10 * 1000 * dfs.heartbeat.interval

The default values for "dfs.namenode.heartbeat.recheck-interval" is 300000 milliseconds(5 minutes) and dfs.heartbeat.interval is "3 seconds"

.

Reference:

- https://github.com/apache/hadoop/blob/release-2.7.3-RC1/hadoop-hdfs-project/hadoop-hdfs/src/main/jav...

- http://pe-kay.blogspot.com/2016/02/dead-datanode-detection.html

.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Follow-up comment... Any comments?

avatar
Master Mentor

@Sedat Kestepe

- The Namenode determines whether a datanode dead or alive by using heartbeats.

- Each DataNode sends a Heartbeat message to the NameNode every 3 seconds (default value).

- This heartbeat interval is controlled by the "dfs.heartbeat.interval" property defined in hdfs-site.xml file.

- If a datanode dies, namenode waits for almost 10 mins before removing it from live nodes.

- The time period for determining whether a datanode is dead is calculated as

dfs.namenode.heartbeat.recheck-interval + 10 * 1000 * dfs.heartbeat.interval

The default values for "dfs.namenode.heartbeat.recheck-interval" is 300000 milliseconds(5 minutes) and dfs.heartbeat.interval is "3 seconds"

.

Reference:

- https://github.com/apache/hadoop/blob/release-2.7.3-RC1/hadoop-hdfs-project/hadoop-hdfs/src/main/jav...

- http://pe-kay.blogspot.com/2016/02/dead-datanode-detection.html

.

avatar
Expert Contributor

You are awesome enough to thank so much! 🙂

I was expecting just to see if the behaviour I see is normal but your explanation to me like teaching to fish instead of giving it. I have learned the procedure instead, how it worked.

Thanks again! 😄