Hello,
Is there a way to validate HDFS data directories to ensure missing blocks won't be reported before HDFS or the rest of the CDH services get started up?
Let's say I have 10 racks and 10 workers per rack. I want to reboot each worker but before I do, I would like to run data disk integrity checks before and after. If they are different after a reboot, then I know filesystem corruption occurred and I really shouldn't begin restarting new workers. Assuming rack awareness is enabled, what commands could I run offline to give me 100% certainty that no missing blocks will be reported on a given node?
What will give me that 100% certainty? Running storage commands to determine drive failures? Smartctl? File checksums (sha256sum)? Filesystem integrity checks (fsck)?
What would be the equivalent of the HDFS missing blocks check but without HDFS or any other CDH services running?
I would also like to know how the Cloudera missing blocks check works? Does the Cloudera missing blocks check not only verify that the file block exists on the correct data disk and directory but also that the checksum of the file and it's integrity match what is expected?
Thx,