Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS df -h and clouera manager direct report showing different space usages

HDFS df -h and clouera manager direct report showing different space usages

New Contributor

my hadoop fs -fsck gives below value

.................Status: HEALTHY
 Total size:    116041111805900 B (Total open files size: 866932 B)
 Total dirs:    462602
 Total files:   5214617
 Total symlinks:                0 (Files currently being written: 64)
 Total blocks (validated):      5202239 (avg. block size 22305993 B) (Total open file blocks (not validated): 12)
 Minimally replicated blocks:   5202239 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0112383
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          11
 Number of racks:               1

 

hdfs dfs -df -h
 Size Used Available Use%
433.8 T 352.5 T 80.4 T 81%

 

in cloudera manager the directory usage for / is showing as 314 tb

 

I am not able to tally the remaining 38 tb storage.

 

Can anyone suggest what is occupying the reaming 40 tb storage.

 

Thanks

1 REPLY 1
Highlighted

Re: HDFS df -h and clouera manager direct report showing different space usages

Community Manager

Hi @kraravindh ,

 

We have a knowledge article which may help to explain the question you have:

https://my.cloudera.com/knowledge/Disk-usage-in-Cloudera-Manager-host-metrics-is-showing?id=83080

 

In case you do not have access to above, here is some snippet that may help:

==========

In order to understand the difference, we need to know about how df and Cloudera Manager calculate the disk usage.

df output:
From the df output, if you sum the space used and the free space, it does not add up to the total space on the disk. The reason for this is that the actual usable space on the disk is not the same as the disk capacity, as there is overhead and space reserved by the OS. Please see this article "Why The Linux df Command Shows Lesser Free Disk Space?" for more detail information. The way df calculates the %used does not take account of this and it makes it look like there is more % free than there really is.

Cloudera Manager host metrics:
The way Cloudera Manager calculates the used space is to take the 'usable free space' report by df (ie 931G) and subtract that from the total disk capacity (985G), which is what you see here. In other words, CM includes the unusable / reserved overhead on the disk in the used space, while the OS does not.

==========

 

You may want to check if the 'space remaining' on each is approximately the same. If so, then there should be no concerns because the space used is generally different due to Cloudera Manager UI includes the overhead while df does not.

 

Thanks,
Li

Li Wang, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum

Don't have an account?
Coming from Hortonworks? Activate your account here