Daily I need to collect disc usage of a particular folders in hdfs, can someone please suggest the effiecent why to capture dis usage for selected folder.
I see we can run du(disc usage) command on the required folders everytime when i want the metrics, do we have any effiecent way than this? is there a way i can configure in ambari to capture the du for selected folder?
I would create a simple shell script which will have 'du' commands for needed folders --> Redirect results of the script to some output file (current timestamp,size) --> Schedule the script in crontab --> Whenever required, just have a look at output file.
If needed, set some threshold --> write another script to give you an alert --> Post it to ambari using REST API.
Please refer below blog for more details on the second part.
Hope this answers your question!
If you are looking at creating custom alerts with Ambari a python script may be a good option. Snakebite is a more efficient tool than running hdfs command. http://snakebite.readthedocs.io/en/latest/client.html
If running once a day hdfs dfsadmin performance should not be an issue.
To expand on mqureshi's answer: https://community.hortonworks.com/articles/16846/how-to-identify-what-is-consuming-space-in-hdfs.htm...
Creating an Ambari Alert https://github.com/monolive/ambari-custom-alerts/tree/master/spaceQuota