Support Questions
Find answers, ask questions, and share your expertise

how to find the du for folders in hdfs

New Contributor

Daily I need to collect disc usage of a particular folders in hdfs, can someone please suggest the effiecent why to capture dis usage for selected folder.

I see we can run du(disc usage) command on the required folders everytime when i want the metrics, do we have any effiecent way than this? is there a way i can configure in ambari to capture the du for selected folder?

3 REPLIES 3

Super Guru
@Varma vetukuri

I would create a simple shell script which will have 'du' commands for needed folders --> Redirect results of the script to some output file (current timestamp,size) --> Schedule the script in crontab --> Whenever required, just have a look at output file.

If needed, set some threshold --> write another script to give you an alert --> Post it to ambari using REST API.

Please refer below blog for more details on the second part.

https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.htm...

Hope this answers your question!

Super Guru
@Varma vetukuri

I think you are looking for "hdfs dfsadmin -report".

sudo -u hdfs hdfs dfsadmin -report

If you are looking at creating custom alerts with Ambari a python script may be a good option. Snakebite is a more efficient tool than running hdfs command. http://snakebite.readthedocs.io/en/latest/client.html

If running once a day hdfs dfsadmin performance should not be an issue.

To expand on mqureshi's answer: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.htm...

Creating an Ambari Alert https://github.com/monolive/ambari-custom-alerts/tree/master/spaceQuota

cheers,

Andrew