Support Questions

Find answers, ask questions, and share your expertise

impact of running with a datanode offline


A datanode service on one of my cluster nodes is down because I lost a hard drive/file system.

I have a drive on order but it may be a couple of days until I have it in hand.

What is the impact of continuing to run in this state?


Super Guru

@Jon Page

Depends on the environment the cluster is in. Since you are asking here and it will take couple of days for drive to arrive, we can reasonably assume that this is dev/sandbox type environment. Here is what's going to happen.

Because of a loss node, you have lost some data blocks. Once a data node is marked dead, Hadoop will start replicating the lost blocks on available nodes. This will create network traffic that can be unnecessary in some cases (seems like in this case). To avoid that, you can increase


This time decides the interval to check for expired datanodes. With this value and dfs.heartbeat.interval, the interval of deciding the datanode is stale or not is also calculated. The unit of this configuration is millisecond.

You can increase this value to increase the time it will take for data node to be marked stale and that effectively buys you more time in which you can replace a drive. Problem is, this setting requires restart, so its a little late.

For now, you just run with under replicated blocks. Unless you lose more disks, you should not lose data but there is of course a risk of data loss if two more disks, contain additional replicas of lost blocks also fail.