Created 07-20-2016 06:58 PM
How to check the total amount of data present on a hdp Cluster?
Created on 07-20-2016 08:30 PM - edited 08-19-2019 01:33 AM
1) If you hover your mouse over the "HDFS Disk Usage" widget (upper left hand corner) in the Ambari Dashboard it will show you the following details:
DFS Used: Storage used for data
Non-DFS Used: Storage used for things such as logs, shuffle writes, etc...
Remaining: Remaining storage
2) From the command line you can also run "sudo -u hdfs hdfs dfsadmin -report", which will generate a full report of hdfs storage usage.
3) Finally, if you would like to check the disk usage for a particular folder (and sub folders), then you can use commands like "hadoop fsck", "hadoop fs -dus" or "hadoop fs -count -q". For an explanation of the differences between these commands as well as how to read the results please take a look at this post:
http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/
Created on 07-20-2016 08:30 PM - edited 08-19-2019 01:33 AM
1) If you hover your mouse over the "HDFS Disk Usage" widget (upper left hand corner) in the Ambari Dashboard it will show you the following details:
DFS Used: Storage used for data
Non-DFS Used: Storage used for things such as logs, shuffle writes, etc...
Remaining: Remaining storage
2) From the command line you can also run "sudo -u hdfs hdfs dfsadmin -report", which will generate a full report of hdfs storage usage.
3) Finally, if you would like to check the disk usage for a particular folder (and sub folders), then you can use commands like "hadoop fsck", "hadoop fs -dus" or "hadoop fs -count -q". For an explanation of the differences between these commands as well as how to read the results please take a look at this post:
http://www.michael-noll.com/blog/2011/10/20/understanding-hdfs-quotas-and-hadoop-fs-and-fsck-tools/