Created on 06-22-202310:08 AM - edited on 06-27-202304:42 AM by VidyaSargur
High level Problem Explanation
The occurrence of the "block missing" and "block corruption" alerts is observed in the Cloudera Manager dashboard or the namenode webui. While there isn't a common cause, these alerts can be due to any of the outlined issues.
Detailed Technical Explanation
When the namenode does not receive reports for all of the block replicas of a file, it marks those blocks as missing. Similarly, when all the block replicas are corrupted, the namenode marks them as corrupt. However, if at least one of the block replica(1/3) is reported as being healthy, then there is no missing or corruption alert.
There are multiple scenarios in which blocks can be marked as missing and corrupt.
During a major upgrade, users may encounter block missing errors. This issue arises due to insufficient datanode heap.
The upgrade process involves migrating the disk layout and requires additional datanode heap space. users who are upgrading to releases CDP 7.1.7 and above will not experience these issues. However, for other users, it is necessary to increase the datanode heap size by 2x from current heap. Once the upgrade is completed, the heap size can be reverted to its original value.
The adjustment in heap size is required specifically for the layout upgrade from -56 (256x256) to -57 (32x32). During this process, each volume's blocks are scanned, and a list of LinkArgs and file objects are stored as String which resulting in significant heap utilization.
If a datanode crashes with an Out of Memory error during the upgrade, then you experience a missing blocks. This occurs because the blocks were in the old layout, but the datanode assumes the blocks were in the new layout. If you encounter this issue, it is recommended to contact Cloudera for assistance in fixing the block layout.
User manually replaces the old fsimage in the namenode's current directory, which resulted inconsistent behavior and shows missing and corrupt blocks after reboot.
In rare instances, users may encounter a scenario where the dfs.checksum.type is set to NULL, so disabling the safety mechanisms in place results block corruption.
HDFS incorporates a built-in checksum mechanism that is enabled by default. When a client writes data, it calculates a CRC checksum for the data. Additionally, the last datanode in the write pipeline also calculates a checksum for the received data. If the checksums from the client and datanode match, the data is considered valid and is stored on disk along with the checksum information.
Checksums play a crucial role in identifying various hardware issues, including problems with network cables, incompatible MTU network settings, and faulty disks or controllers. It is highly recommended to enable checksums on an HDFS cluster to ensure data integrity.
However, when the checksum.type is set to NULL, all these safety mechanisms are effectively disabled. Without checksums, HDFS is unable to detect corruption. This means that data can be silently corrupted during tasks such as data migration between hosts or due to disk failures.
To identify this block corruption and checksum, you can use the "hdfs debug verify" command to verify the metadata of a block.
For assistance in efficiently checking the block checksums of the entire HDFS filesystem, we recommend reaching out to the Cloudera storage team. We have developed a specialized tool designed to perform this task quickly and effectively.
When the size of a block report exceeds the limit set by "ipc.maximum.data.length" (default value of 64 MB), the namenode rejects those block reports which results missing blocks. This rejection is typically accompanied by log entries in both the namenode and datanode logs indicating that the "Protocol message was too large."
In scenarios where a single datanode holds over 15 million blocks, it becomes more likely to encounter block reports that surpass the configured limit. This can result in the rejection of such block reports by the namenode due to their excessive size.
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: RemoteException in offerService org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.lang.IllegalStateException: com.google.protobuf.InvalidProtocolBufferException:
Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit.
There are instances where a user administrator may accidentally delete disk data with the intention of reclaiming disk space, but without a complete understanding of Hadoop data storage in the disk. This leads to the loss of blocks.
The filesystem appears to be in a healthy state, but when running FSCK, it detects a missing block. This issue arises from a snapshot reference where some of its blocks are missing.
A file named "f1" is created, and its corresponding block ID is "blk_100005".
A snapshot named "s1" is taken for "f1".
Running FSCK shows that the filesystem is healthy, as expected.
However, at a later point, "blk_100005" goes missing due to unknown reasons.
Consequently, the filesystem becomes corrupted, and FSCK accurately reports the corrupt filename and its block.
The next step involves removing the corrupt file "f1" using the command: hdfs dfs -rm -skipTrash f1.
Create or reload a new file, "f1", which creates a new block labeled "blk_100006".
The problem arises at this stage: FSCK continues to identify file "f1" as corrupt even after reloading it with a new block.
This scenario presents a false positive, where FSCK should appropriately report missing references, whether they stem from live data or a snapshot. The issue described here HDFS-15770, a minor improvement that requires fix.
The occurrence of a "Too many open file" exception in the DataNode can lead to block deletion, ultimately resulting in missing blocks. This issue has been addressed in the CDP.
In a rare scenario that has not been conclusively proven yet, a user experienced a loss of multiple racks in their cluster due to rack failure. After restarting the DataNode following a power event, the blocks were found in the "rbw" (Replica Being Written) directory.
All the blocks from the "rbw" directory transitioned into the "RWR" (Replica Waiting to be Recovered) state. As a result, the NameNode marked these blocks as corrupt, citing the reason as "Reason.INVALID_STATE".
However, the issue lies in the fact that the NameNode only removes a replica from the corrupt map if the reason for corruption is "GENSTAMP_MISMATCH". In this particular scenario, the reason is different, causing a challenge in removing the replicas from the corrupt map.
There is an issue where certain information in an incremental block report (IBR) gets replaced with existing stored information while queuing the block report on a standby Namenode. This can result in false block corruption reports.
In a specific incident, a Namenode transitioned to an active state and started reporting missing blocks with the reason "SIZE_MISMATCH" for corruption. However, upon investigation, it was discovered that the blocks were actually appended, and the sizes on the datanodes were correct.
The root cause was identified as the standby Namenode queuing IBRs with modified information.
Users experiencing this false block corruption issue are advised to run a manual block report, as it can help rectify the erroneous reports and restore accurate block information. This issues is addressed in CDP 7 SP1 and SP2.
As a first step, it is important to verify if the fsck command reports any corrupt blocks.
hdfs fsck / -list-corruptfileblocks -files -blocks -locations
Connecting to namenode via https://host:20102/fsck?ugi=hdfs&listcorruptfileblocks=1&files=1&blocks=1&locations=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files
Note: A healthy filesystem should always have zero corrupt files.
Determine the number of missing blocks that have only one replica. These blocks are of critical concern as the chances of recovery are minimal.
Verify the operational status of all datanodes in the cluster. If three or more datanodes are offline or not functioning correctly, it can result in missing and corrupt blocks.
Ensure that you check for any dead datanodes in the cluster. If three or more datanodes are in a dead state, it can lead to the presence of missing and corrupt blocks.
Verify if there are multiple disk failures across the datanodes. In such cases, the presence of missing and corrupt blocks is expected.
Check whether the storage capacity of the datanode is fully utilized, reaching 100%. In such instances, the datanode may become unresponsive and fail to send block reports.
Verify if the block report sent by the datanode is rejected by the namenode. This can be observed in the namenode logs as a warning or error message.
dataLength + is longer than maximum configured RPC length " + maxDataLength + ". RPC came from " + getHostAddress();
Confirm if the user has made changes to the node configuration groups in Cloudera Manager (CM). Such modifications could potentially result in the omission of several data disk directories, leading to issues with missing data.
Verify whether the block still exists in the local filesystem or if it has been unintentionally removed by users.
find <dfs.datanode.data.dir> -type f -iname <blkid>*
Repeat the same step in all datanodes
Ensure that you check if the mount point has been unmounted due to a filesystem failure. It is important to verify the datanode and system logs for any relevant information in this regard.
Verify whether the block has been written into the root volume as a result of disk auto unmount. It is essential to consider that the data might be concealed if you remount the filesystem on top of the existing datanode directory.
Note Examine all the provided scenarios and their corresponding actions. It is important to note that if a block is physically present on any of the datanodes, it is generally recoverable, except in cases where the block is corrupt.