Created 05-15-2018 11:45 AM
I am using HDP 2.6.4 and Ambari 2.6.1.5. In Ambari HDFS summary page there is a metric called "Disk Usage (DFS Used)" which in case is showing 19GB. If I do a hdfs dfs -du -h / it is giving a total of 6GB. Shouldn't these two results be the same, or I am missing something here?
Created 05-15-2018 04:46 PM
This is referring to the replication factor of HDFS which defaults to 3. This means that files you place on HDFS are stored 3 times on disks across the cluster for redundancy/node failure tolerance purposes. Therefore your 'du -h' will give you the sum of file sizes you have places on HDFS whereas the HDFS disk usage will give you the total disk space consumed.
6.XX GB * 3 replication factor = ~19 GB
Created 05-15-2018 04:46 PM
This is referring to the replication factor of HDFS which defaults to 3. This means that files you place on HDFS are stored 3 times on disks across the cluster for redundancy/node failure tolerance purposes. Therefore your 'du -h' will give you the sum of file sizes you have places on HDFS whereas the HDFS disk usage will give you the total disk space consumed.
6.XX GB * 3 replication factor = ~19 GB
Created 05-16-2018 09:35 AM
@anarasimham, thanks for the info, Any reference to documentation stating this ?
Created 05-16-2018 12:06 PM
I couldn't find any documentation on this specific calculation, but you can understand it through testing as you have already. If you'd like to verify, insert a 2GB file into HDFS and get measurements before and after the insert. You should see the numbers change by the respective amounts.