Support Questions

Find answers, ask questions, and share your expertise

What causes a datanode to consider a volume as failed

avatar
Rising Star

Hi.

We have encountered issues on our cluster that seems to be caused by bad disks.

When we run "dmesg" on the datanode host we see warnings such as:

This should not happen!!  Data will be lost
sd 1:0:20:0: [sdv]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:20:0: [sdv]  Sense Key : Medium Error [current] 
Info fld=0x2f800808
sd 1:0:20:0: [sdv]  Add. Sense: Unrecovered read error
sd 1:0:20:0: [sdv] CDB: Read(10): 28 00 2f 80 08 08 00 00 08 00
end_request: critical medium error, dev sdv, sector 796919816
EXT4-fs (sdv1): delayed block allocation failed for inode 70660422 at logical offset 2049 with max blocks 2048 with error -5

In the datanode logs we see warnings such as:

2016-05-16 09:41:42,694 WARN  util.Shell (DU.java:run(126)) - Could not get disk usage information
ExitCodeException exitCode=1: du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir163': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir155': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir165': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir166': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir164': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir159': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir154': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir153': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir167': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir161': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir157': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir152': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir160': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir156': Input/output error
du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir158': Input/output error

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.fs.DU.run(DU.java:190)
        at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:119)
        at java.lang.Thread.run(Thread.java:745)

and :

2016-05-16 09:31:14,494 ERROR datanode.DataNode (DataXceiver.java:run(253)) - datavault-prod-data8.internal.machines:1019:DataXceiver error processing READ_BLOCK operation  src: /x.x.x.x:55220 dst: /x.x.x7.x:1019
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1356445971-x.x.x.x-1430142563027:blk_1367398616_293808003
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:431)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:229)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
        at java.lang.Thread.run(Thread.java:745)

These errors/warnings do not however, seem to be enough for the datanode to consider a volume as "failed" and shut itself down. Some consequences that we have seen when this happens is that it's impossible to scan a hbase region that is served by a regionserver on the same host as the datanode, and also that mapreduce jobs get stuck accessing the host.

This brings me to my question: What is the requirement for a datanode to consider a volume as failed?

Best Regards

/Thomas

1 ACCEPTED SOLUTION

avatar

Hi @Thomas Larsson, the DataNode will perform a simple disk check operation in response to certain IO errors. The disk check verifies that the DataNode's storage directory root is readable, writeable and executable. If either of these checks fails, the DataNode will mark the volume as failed.

HDFS failed disk detection can be better than it is today. We have seen instances where these checks are insufficient to detect volume failures. It is a hard problem in general since disks fail in byzantine ways where some but not all IOs may fail or a subset of directories on the disk become inaccessible.

View solution in original post

16 REPLIES 16

avatar
Super Guru
@Thomas Larsson
  1. If the Namenode is not able to get a heartbeat from datanode
  2. if datanode is not able to send block report to namenode in specified time [here - the data node might node able to send block report due to bad disk]

Then datanode seems to be down/not responsive in such cases.

avatar
Rising Star

Hi Sagar,

I think you misunderstand my question. My question was NOT "In what scenarios does a namenode consider a datanode dead?".

It's more a question of why our datanode does not shut itself down when one of its disk is failing. I assumed that this what should happen since our setting of

dfs.datanode.failed.volumes.tolerated

is the default, i.e. zero.

avatar
Rising Star

A follow-up.

I forgot to mention our hadoop version: HDP 2.2.6.0, i.e. hadoop 2.6.

I looked into the hadoop code and found the org.apache.hadoop.util.DiskChecker class which seems to be used by a monitoring thread to monitor the health of a datanodes disks.

In order to try to verify that the datanode actually does not detect this error, I created a very simple Main class that just calls the DiskChecker.checkDirs method.

Main.java:

import java.io.File;

public class Main {

  public static void main(String[] args) throws Exception {
  	org.apache.hadoop.util.DiskChecker.checkDirs(new File(args[0]));
  }
}

If I run this class on one of our problematic directories, nothing is detected:

[thomas.larsson@datavault-prod-data8 ~]$ /usr/jdk64/jdk1.7.0_67/bin/javac Main.java -cp /usr/hdp/2.2.6.0-2800/hadoop/hadoop-common.jar[thomas.larsson@datavault-prod-data8 ~]$ sudo java -cp .:/usr/hdp/2.2.6.0-2800/hadoop/hadoop-common.jar:/usr/hdp/2.2.6.0-2800/hadoop/lib/* Main /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

However, trying to list the files in this subdir looks like this:

[thomas.larsson@datavault-prod-data8 ~]$ sudo ls -la /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir163: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir155: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir165: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir166: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir164: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir159: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir154: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir153: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir167: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir161: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir157: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir152: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir160: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir156: Input/output error
ls: cannot access /mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir158: Input/output error
total 984
drwxr-xr-x. 258 hdfs hadoop 12288 13 dec 12.52 .
drwxr-xr-x. 258 hdfs hadoop 12288 22 nov 14.50 ..
drwxr-xr-x.   2 hdfs hadoop  4096 12 maj 18.12 subdir0
drwxr-xr-x.   2 hdfs hadoop  4096 12 maj 18.02 subdir1
...
drwxr-xr-x.   2 hdfs hadoop  4096 30 apr 19.21 subdir151
d??????????   ? ?    ?          ?            ? subdir152
d??????????   ? ?    ?          ?            ? subdir153
d??????????   ? ?    ?          ?            ? subdir154
d??????????   ? ?    ?          ?            ? subdir155
d??????????   ? ?    ?          ?            ? subdir156
d??????????   ? ?    ?          ?            ? subdir157
d??????????   ? ?    ?          ?            ? subdir158
d??????????   ? ?    ?          ?            ? subdir159
drwxr-xr-x.   2 hdfs hadoop  4096 12 maj 18.12 subdir16
d??????????   ? ?    ?          ?            ? subdir160
d??????????   ? ?    ?          ?            ? subdir161
d??????????   ? ?    ?          ?            ? subdir162
d??????????   ? ?    ?          ?            ? subdir163
d??????????   ? ?    ?          ?            ? subdir164
d??????????   ? ?    ?          ?            ? subdir165
d??????????   ? ?    ?          ?            ? subdir166
d??????????   ? ?    ?          ?            ? subdir167
drwxr-xr-x.   2 hdfs hadoop  4096 12 maj 18.30 subdir168
drwxr-xr-x.   2 hdfs hadoop  4096 12 maj 18.28 subdir169
...

So, it seems like this problem is undetectable by a datanode.

avatar
Super Collaborator

@Thomas Larsson From the above list output on mount point , it seems that whole volume is not inaccessible , some of its subdir are inaccessible .

what is the output of this command : sudo ls -la /mnt/data21/

avatar
Rising Star

Yes, I agree that is exactly how it seems. There is no problem running ls directly on /mnt/data21.

[thomas.larsson@datavault-prod-data8 ~]$ ls -la /mnt/data21
total 28
drwxr-xr-x.  4 root root  4096  9 nov  2015 .
drwxr-xr-x. 26 root root  4096  9 nov  2015 ..
drwxr-xr-x.  4 root root  4096 28 jan 12.32 hadoop
drwx------.  2 root root 16384  6 nov  2015 lost+found

avatar

In what scenarios does a namenode consider a datanode dead?

1) If namenode not able to get heartbeat from datanode.

2) Read only file system on datanode JBOD's

why our datanode does not shut itself down when one of its disk is failing.

1) If you have single hard disk on your data node then it will go off automatically.

2) A DataNode stores data on HDFS in replication manner. And data replicated across them, may be is the reason datanode does not itself down.

avatar
Rising Star

Hi Ashnee.

See my comment to Sagar above.

avatar
Master Guru

Can you check your setting for dfs.datanode.failed.volumes.tolerated in hdfs-site.xml? DN will shut itself down if the number of detected failed disks is larger than this setting. If you want DN to stop on a first failed disk, be sure to set this property to zero. Now, if it's already zero, something else could be wrong.

avatar
Rising Star

Hi Predrag,

See my comment to Sagar above, our value of that setting is the default, i.e. zero.