We've been loading a lot of data into our cluster this week and I'd like to see where the growth is coming from. What's the best way to get at this information?
The Reports Manager is your go-to for historical disk reporting.
Every hour, the Reports Manager grabs the HDFS fsimage and indexes its paths and space usage. These hourly reports are seen (in Cloudera Manager 5.x) under Clusters > General > Reports.
The first three reports there (Current Disk Usage by User | Group| Directory) will let you show how space usage looks right now ("now" being within the last ~hour).
The second group of three reports (Historical Disk Usage by User | Group| Directory) will let you specify date ranges and get granular down to per-hour disk usage growth, as far back as the information has been captured.
I imagine the Historical Disk Usage by Directory is, in the end, where you'll find the most relevant info, broken out by directory as you asked in your post's title.
Hope this helps!
Is there a CM API call to fetch this information . (Usage by directory), I'm aware of a CM API call which fetches it per user for a daily and weekly level. Any other to fetch it based on location/directory on a monthly basis?