Support Questions
Find answers, ask questions, and share your expertise

how to find the du for folders in hdfs

New Contributor

Daily I need to collect disc usage of a particular folders in hdfs, can someone please suggest the effiecent why to capture dis usage for selected folder.

I see we can run du(disc usage) command on the required folders everytime when i want the metrics, do we have any effiecent way than this? is there a way i can configure in ambari to capture the du for selected folder?


Super Guru
@Varma vetukuri

I would create a simple shell script which will have 'du' commands for needed folders --> Redirect results of the script to some output file (current timestamp,size) --> Schedule the script in crontab --> Whenever required, just have a look at output file.

If needed, set some threshold --> write another script to give you an alert --> Post it to ambari using REST API.

Please refer below blog for more details on the second part.

Hope this answers your question!

Super Guru
@Varma vetukuri

I think you are looking for "hdfs dfsadmin -report".

sudo -u hdfs hdfs dfsadmin -report

If you are looking at creating custom alerts with Ambari a python script may be a good option. Snakebite is a more efficient tool than running hdfs command.

If running once a day hdfs dfsadmin performance should not be an issue.

To expand on mqureshi's answer:

Creating an Ambari Alert