Support Questions

Find answers, ask questions, and share your expertise

HDFS - Missing Blocks Inconsistent

avatar
New Contributor

Hi,

 

I have a three node cluster where different tools are reporting different values for missing blocks. None of the tools show files with missing blocks. I've looked for similar reports but haven't found a bug or comment that seems to match this issue. Screenshots and output below, thanks for any assitance.

 

Using CDH 5.12.0

 

Cloudera manager reports: "1 missing blocks in the cluster. 33,823 total blocks in the cluster. Percentage missing blocks: 0.00%. Critical threshold: any."

 

The NN legacy UI reports "No missing blocks found at the moment" when clicking for details.

 

NN UI:
NameNode UINameNode UI

 

$ hdfs dfsadmin -report
Configured Capacity: 71542176645120 (65.07 TB)
Present Capacity: 71539894944988 (65.07 TB)
DFS Remaining: 69770020959452 (63.46 TB)
DFS Used: 1769873985536 (1.61 TB)
DFS Used%: 2.47%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 1
Missing blocks (with replication factor 1): 0

 

 

$ hdfs fsck -includeSnapshots /

<snip>

Status: HEALTHY

Total size: 585202873230 B (Total open files size: 332 B)
Total dirs: 8951
Total files: 32467
Total symlinks: 0 (Files currently being written: 5)
Total blocks (validated): 33819 (avg. block size 17303967 B) (Total open file blocks (not validated): 4)
Minimally replicated blocks: 33819 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1

1 REPLY 1

avatar
Rising Star

HDFS fsck only checks the files that are persisted on hdfs and not open files. Since you're seeing just one missing block in the UI warnings of CM and NN and no missing blocks in fsck output, this would indicate that the missing block alert is being generated from a file that is open in the memory and is most likely a false alarm.

 

This should go away when the NN role is restarted or the cluster is restarted, probably during your next maintainance window.