Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Query on Disk Usage (DFS Used) in ambari UI

avatar
Rising Star

I am using HDP 2.6.4 and Ambari 2.6.1.5. In Ambari HDFS summary page there is a metric called "Disk Usage (DFS Used)" which in case is showing 19GB. If I do a hdfs dfs -du -h / it is giving a total of 6GB. Shouldn't these two results be the same, or I am missing something here?


1 ACCEPTED SOLUTION

avatar
Super Collaborator
@Vinuraj M

This is referring to the replication factor of HDFS which defaults to 3. This means that files you place on HDFS are stored 3 times on disks across the cluster for redundancy/node failure tolerance purposes. Therefore your 'du -h' will give you the sum of file sizes you have places on HDFS whereas the HDFS disk usage will give you the total disk space consumed.

6.XX GB * 3 replication factor = ~19 GB

View solution in original post

3 REPLIES 3

avatar
Super Collaborator
@Vinuraj M

This is referring to the replication factor of HDFS which defaults to 3. This means that files you place on HDFS are stored 3 times on disks across the cluster for redundancy/node failure tolerance purposes. Therefore your 'du -h' will give you the sum of file sizes you have places on HDFS whereas the HDFS disk usage will give you the total disk space consumed.

6.XX GB * 3 replication factor = ~19 GB

avatar
Rising Star

@anarasimham, thanks for the info, Any reference to documentation stating this ?

avatar
Super Collaborator

I couldn't find any documentation on this specific calculation, but you can understand it through testing as you have already. If you'd like to verify, insert a 2GB file into HDFS and get measurements before and after the insert. You should see the numbers change by the respective amounts.