Support Questions

bansal_himani13 · ‎06-01-2018

How is Data node failure is tackled in Hadoop.

krajguru · ‎06-01-2018

Namenode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. The NameNode ensures that each block is sufficiently replicated. When it detects the loss of a DataNode, it instructs remaining nodes to maintain adequate replication by creating additional block replicas. For each lost replica, the NameNode picks a (source, destination) pair where the source is an available DataNode with another replica of the block and the destination is the target for the new replica

Reference : https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication

View solution in original post

krajguru · ‎06-01-2018

@Himani Bansal

Namenode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. The NameNode ensures that each block is sufficiently replicated. When it detects the loss of a DataNode, it instructs remaining nodes to maintain adequate replication by creating additional block replicas. For each lost replica, the NameNode picks a (source, destination) pair where the source is an available DataNode with another replica of the block and the destination is the target for the new replica

Reference : https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Data+Replication

sharmadukool136 · ‎06-01-2018

Each file to be stored in HDFS is split into numerous blocks and default block size being 128 MB. Each of these blocks are replicated in different data node, the default replication factor being 3. Data node continuously sends heart beat to name node. When the name node stop receiving heartbeat, it understands that particular data node is down. Using the metadata in its memory, name node identifies what all blocks are stored in this data node and identifies the other data nodes in which these blocks are stored. It also copies these blocks into some other data nodes to reestablish the replication factor. This is how, name node tackles data node failure.

Cloudera Community

Support Questions

How name node tackles data node failure in Hadoop?