my hadoop fs -fsck gives below value
|Total size: 116041111805900 B (Total open files size: 866932 B)|
|Total dirs: 462602|
|Total files: 5214617|
|Total symlinks: 0 (Files currently being written: 64)|
|Total blocks (validated): 5202239 (avg. block size 22305993 B) (Total open file blocks (not validated): 12)|
|Minimally replicated blocks: 5202239 (100.0 %)|
|Over-replicated blocks: 0 (0.0 %)|
|Under-replicated blocks: 0 (0.0 %)|
|Mis-replicated blocks: 0 (0.0 %)|
|Default replication factor: 3|
|Average block replication: 3.0112383|
|Corrupt blocks: 0|
|Missing replicas: 0 (0.0 %)|
|Number of data-nodes: 11|
|Number of racks: 1|
hdfs dfs -df -h
Size Used Available Use%
433.8 T 352.5 T 80.4 T 81%
in cloudera manager the directory usage for / is showing as 314 tb
I am not able to tally the remaining 38 tb storage.
Can anyone suggest what is occupying the reaming 40 tb storage.
Hi @kraravindh ,
We have a knowledge article which may help to explain the question you have:
In case you do not have access to above, here is some snippet that may help:
In order to understand the difference, we need to know about how df and Cloudera Manager calculate the disk usage.
From the df output, if you sum the space used and the free space, it does not add up to the total space on the disk. The reason for this is that the actual usable space on the disk is not the same as the disk capacity, as there is overhead and space reserved by the OS. Please see this article "Why The Linux df Command Shows Lesser Free Disk Space?" for more detail information. The way df calculates the %used does not take account of this and it makes it look like there is more % free than there really is.
Cloudera Manager host metrics:
The way Cloudera Manager calculates the used space is to take the 'usable free space' report by df (ie 931G) and subtract that from the total disk capacity (985G), which is what you see here. In other words, CM includes the unusable / reserved overhead on the disk in the used space, while the OS does not.
You may want to check if the 'space remaining' on each is approximately the same. If so, then there should be no concerns because the space used is generally different due to Cloudera Manager UI includes the overhead while df does not.