Support Questions

hosako · ‎12-16-2015

I'm just curious to know what would be the main cause of corrupted blocks 🙂

As it would always have at least three copies, I thought this would be hard to happen (but happens)

If HDFS can't create one copy or detects corruption, wouldn't it try to recover by copying a good one into another DataNode?

Or once a file was properly created in HDFS, does it never check if file is corrupted or not until HDFS is restarted?

cnauroth · ‎12-16-2015

A corrupted block means that HDFS cannot find a valid replica containing that block's data. Since replication factor is typically 3, and since the default replica placement logic spreads those replicas across different machines and racks, it's very unlikely to encounter corruption on typical files.

Some users may choose to use a replication factor of 1 for non-critical files that can be recreated from some other system of record. This is an optimization that can save storage at the cost of reduced fault tolerance. If replication factor is only 1, then each block in that file is a single point of failure. Loss of the node hosting that 1 replica will cause the block to be reported as corrupted.

HDFS can detect corruption of a replica caused by bit rot due to physical media failure. In that case, the NameNode will schedule re-replication work to restore the desired number of replicas by copying from another DataNode with a known good replica.

View solution in original post

cnauroth · ‎12-16-2015

A corrupted block means that HDFS cannot find a valid replica containing that block's data. Since replication factor is typically 3, and since the default replica placement logic spreads those replicas across different machines and racks, it's very unlikely to encounter corruption on typical files.

Some users may choose to use a replication factor of 1 for non-critical files that can be recreated from some other system of record. This is an optimization that can save storage at the cost of reduced fault tolerance. If replication factor is only 1, then each block in that file is a single point of failure. Loss of the node hosting that 1 replica will cause the block to be reported as corrupted.

HDFS can detect corruption of a replica caused by bit rot due to physical media failure. In that case, the NameNode will schedule re-replication work to restore the desired number of replicas by copying from another DataNode with a known good replica.

stevel · ‎12-17-2015

The hdfs fsck operation doesn't check blocks for corruption; that takes too long. It looks at the directory structures alone.

Blocks are checked for corruption whenever they are read; there are little CRC checksum files created for parts of a block which are validated on read() operations. If you work with the file:// filesystem you can see these same files in your local FS. If a block is found to be corrupt on a read, the dfs client will report this to the namenode, and ask for another block, which will be used instead. As Chris said, the namenode then schedules the uncorrupted block for re-replication, as if it was under replicated. The corrupted block doesn't get deleted until that replication succeeds. Why not? If all blocks are corrupt, then maybe you can salvage something from all the corrupt copies of the block.

Datanodes scan all files in the background —they just do it fairly slowly by default so that applications don't suffer. The scan ensures that corrupted blocks are usually found before programs read them, and so that problems with "cold" data are found at all. It's designed to avoid the problem of all replicas getting corrupted and you not noticing until its too late to fix.

Look in the HDFS XML description for the details on the two options you need to adjust

dfs.datanode.scan.period.hours
dfs.block.scanner.volume.bytes.per.second

How disks fail/data gets corrupt is a fascinating problem. Here are some links if you really want to learn more about it

Did you really want that data (an old presentation of mine)
Failure Trends in a Large Disk Drive Population (google)
A Large-Scale Study of Flash Memory Failures in the Field (a recent facebook paper on Flash failures -shows they are less common than you'd fear)

I'd also recommend you look at some of the work on memory corruption -that's enough to make you think that modern laptops and desktops should be using ECC RAM.

hosako · ‎12-18-2015

Thank you very much!

Cloudera Community

Support Questions

In HDFS, why corrupted block(s) happens?