Created 05-16-2016 09:14 AM
Hi.
We have encountered issues on our cluster that seems to be caused by bad disks.
When we run "dmesg" on the datanode host we see warnings such as:
This should not happen!! Data will be lost sd 1:0:20:0: [sdv] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 1:0:20:0: [sdv] Sense Key : Medium Error [current] Info fld=0x2f800808 sd 1:0:20:0: [sdv] Add. Sense: Unrecovered read error sd 1:0:20:0: [sdv] CDB: Read(10): 28 00 2f 80 08 08 00 00 08 00 end_request: critical medium error, dev sdv, sector 796919816 EXT4-fs (sdv1): delayed block allocation failed for inode 70660422 at logical offset 2049 with max blocks 2048 with error -5
In the datanode logs we see warnings such as:
2016-05-16 09:41:42,694 WARN util.Shell (DU.java:run(126)) - Could not get disk usage information ExitCodeException exitCode=1: du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir162': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir163': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir155': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir165': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir166': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir164': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir159': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir154': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir153': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir167': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir161': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir157': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir152': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir160': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir156': Input/output error du: cannot access `/mnt/data21/hadoop/hdfs/data/current/BP-1356445971-x.x.x.x-1430142563027/current/finalized/subdir58/subdir158': Input/output error at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.fs.DU.run(DU.java:190) at org.apache.hadoop.fs.DU$DURefreshThread.run(DU.java:119) at java.lang.Thread.run(Thread.java:745)
and :
2016-05-16 09:31:14,494 ERROR datanode.DataNode (DataXceiver.java:run(253)) - datavault-prod-data8.internal.machines:1019:DataXceiver error processing READ_BLOCK operation src: /x.x.x.x:55220 dst: /x.x.x7.x:1019 org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-1356445971-x.x.x.x-1430142563027:blk_1367398616_293808003 at org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:431) at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:229) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:493) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:745)
These errors/warnings do not however, seem to be enough for the datanode to consider a volume as "failed" and shut itself down. Some consequences that we have seen when this happens is that it's impossible to scan a hbase region that is served by a regionserver on the same host as the datanode, and also that mapreduce jobs get stuck accessing the host.
This brings me to my question: What is the requirement for a datanode to consider a volume as failed?Best Regards
/Thomas
Created 06-29-2016 08:40 PM
Hi @Thomas Larsson, the DataNode will perform a simple disk check operation in response to certain IO errors. The disk check verifies that the DataNode's storage directory root is readable, writeable and executable. If either of these checks fails, the DataNode will mark the volume as failed.
HDFS failed disk detection can be better than it is today. We have seen instances where these checks are insufficient to detect volume failures. It is a hard problem in general since disks fail in byzantine ways where some but not all IOs may fail or a subset of directories on the disk become inaccessible.
Created 05-23-2016 06:54 AM
Terribly sorry, haven't seen that comment. Then, DN couldn't detect your disk as failed.
Created 06-15-2016 07:08 AM
If there are some folders which goes in Read-Only mode and the hard-disk still in mounted as a Read-Write, then this indicates that the mounting options for the filesystem needs to be verified.Check the mounting options which is set for the Harddisk. The harddisk should go in complete Read-Only mode when there is some I/O issues with some of the folders and partitions on the Harddisk.
HDFS will also see the Harddisk in read only mode and then based on the property value - "dfs.datanode.failed.volumes.tolerated" will act accordingly i.e. if the value for dfs.datanode.failed.volumes.tolerated is 0, it will stop the datanode process as soon as it find the Harddisk in Read-Only mode.
Created 06-16-2016 07:03 AM
Hi Ravi,
I'm not sure I understand what you mean. Is there a tool that could detect our type of disk error and automatically remount the drive in read-only mode? Or are you talking about something like the fstab mount options "errors=remount -ro"? The fstab options only means that if errors are encountered when the os tries to mount the drive for read-write mode, it should try to mount it as read-only. But this does not apply to our situation since our machine is not just starting up, its been up and running for a long while and then disk errors start to occur. If you mean some other tool or configuration that can detect and remount while a system is running, please share a link.
Best Regards
Created 06-28-2016 07:41 AM
The filesystem, when sees any Hard-disk error, should go into Read-Only mode. If it does not goes into read-only mode that is possible due to the mounting options.
The filesystem should be mounted with the option errors=remount-ro . This means that if an error is detected, the filesystem will go into read-only.
If the filesystem goes into Read-Only mode, then HDFS will also identify the same. Without OS identifying the filesystem as "read-only" HDFS will not be able to identify the same.
Created 06-29-2016 08:40 PM
Hi @Thomas Larsson, the DataNode will perform a simple disk check operation in response to certain IO errors. The disk check verifies that the DataNode's storage directory root is readable, writeable and executable. If either of these checks fails, the DataNode will mark the volume as failed.
HDFS failed disk detection can be better than it is today. We have seen instances where these checks are insufficient to detect volume failures. It is a hard problem in general since disks fail in byzantine ways where some but not all IOs may fail or a subset of directories on the disk become inaccessible.
Created 07-13-2016 05:31 PM
thanks nice
Created 08-11-2016 12:52 PM