- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDFS to many bad blocks due to checksum errors - Understanding -verifyMeta behaviour
- Labels:
-
HDFS
Created on ‎12-28-2018 04:28 AM - edited ‎09-16-2022 07:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are trying to setup a hadoop installation and are using CDH-5.15.1. We have recently noticed that a lot of blocks are signalled as "bad" due to checksumming errors AFTER they have been finalized. Our installation is small - 8 datanodes each with a single SCSI disk of 50TB (Disk is actually part of SAN) on ext4 filesystem and a replication factor of 3.
I understand that a volume scanner runs which checks integrity of individual blocks by verifying checksum stored in the meta file with the checksum of the actual block. I am also aware of the "hdfs debug -verifyMeta" command which we can run on the datanode and check the checksum of the block with the one stored.
Once we had a few files flagged as corrupted due to missing blocks, I picked one block and checked all the nodes where it lived, On each node, the actual block file had the same size and creation time but different MD5 hash (obtained by running md5sum blk_XXXXXXXX). The meta files all had the same MD5 checksum. Also all the three copies failed the -verifyMeta test with a ChecksumError. (hdfs debug -verifyMeta -block blk_XXXXXX -meta blk_XXXXX_YYY.meta threw checksumerror).
Interested, I scanned one node for more failing files and found a bunch. I concentrated on one block (blk_1073744395) which belonged to file A. I tracked the block to three nodes and all three had different MD5 checksums for the blockfile and all three failed the -verifyMeta test. The fsck -blockId returned a HEALTHY status for all three replicas. I decided to fetch the file from HDFS and did a -copyToLocal. The logs indicated that node1 threw a checksumerror but node2 fulflled the request correctly. The replica was then removed from Node1 and replicated on Node4 where again I found that it had a different MD5 and failed the -verifyMeta test.
My Questions are:
- Is it possible for -verifyMeta check to fail but actual checksum verification (as part of serving the block to client) to pass on the datanode as we saw?
- Should all replicas of the block have same hash (say MD5)?
- What may be causing finalized blocks to start failing checksum errors if disk is healthy?
I would be grateful if someone could throw some light on the behaviour we are seeing in our datanodes.
Created ‎12-28-2018 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, CDH didn't support SAN until very recently, and even now the support is limited.
> - Is it possible for -verifyMeta check to fail but actual checksum verification (as part of serving the block to client) to pass on the datanode as we saw?
I won't say that's impossible. But we've not seen such a case. The -verifyMeta implementation is quite simple actually.
- Should all replicas of the block have same hash (say MD5)?
If all replicas have the same size, they are supposed to have the same checksum. (We support append, no truncate.) If your SAN device is busy, there are chances where HDFS client would give up writing to the DataNode, replicate the blocks to a different DataNode and continue from there. In which case, replicas may have different file length, because some of the are stale.
- What may be causing finalized blocks to start failing checksum errors if disk is healthy?
An under performed disk or a busy DataNode could abort the write to that block. I can't give you a definitive answer because I don't have much experience with HDFS on SAN.
Created ‎12-28-2018 10:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, CDH didn't support SAN until very recently, and even now the support is limited.
> - Is it possible for -verifyMeta check to fail but actual checksum verification (as part of serving the block to client) to pass on the datanode as we saw?
I won't say that's impossible. But we've not seen such a case. The -verifyMeta implementation is quite simple actually.
- Should all replicas of the block have same hash (say MD5)?
If all replicas have the same size, they are supposed to have the same checksum. (We support append, no truncate.) If your SAN device is busy, there are chances where HDFS client would give up writing to the DataNode, replicate the blocks to a different DataNode and continue from there. In which case, replicas may have different file length, because some of the are stale.
- What may be causing finalized blocks to start failing checksum errors if disk is healthy?
An under performed disk or a busy DataNode could abort the write to that block. I can't give you a definitive answer because I don't have much experience with HDFS on SAN.
Created ‎12-29-2018 01:05 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We use Logstash to ship Bro event logs to hadoop. Earlier we used the WebHDFS interface but after a encountering a lot of badblocks (we thought APPEND may be causing something) we have switched to simple hdfs dfs -put. Logstash writes hourly log files (JSON lines) and we ship them to a gateway node and do a -put so I don't think its a client issue.
I am still a little surprised that a block that failed -verifyMeta test was actually deemed OK by HDFS and served.
I don't see any slow write in logs but I do see nodes in pipelines complaining about bad checksums (while writing) and giving up.
Created ‎12-29-2018 12:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WebHDFS append operation was prone to a corrupt bug HDFS-11160, but that was fixed in CDH5.11.0.
> I don't see any slow write in logs but I do see nodes in pipelines complaining about bad checksums (while writing) and giving up.
That's an interesting observation. The checksum error should be a very rare event, if any. Without further details, I would suspect SAN has something to do with it. It's just such a rare setup in our customer install base that it's hard for me to tell what's the effect would be.
Created ‎12-31-2018 08:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
