Reply
Explorer
Posts: 14
Registered: ‎12-19-2013

How can I see HDFS space consumption over time broken out by directory?

We've been loading a lot of data into our cluster this week and I'd like to see where the growth is coming from. What's the best way to get at this information?

Cloudera Employee
Posts: 79
Registered: ‎08-29-2013

Re: How can I see HDFS space consumption over time broken out by directory?

[ Edited ]

The Reports Manager is your go-to for historical disk reporting.

 

Every hour, the Reports Manager grabs the HDFS fsimage and indexes its paths and space usage. These hourly reports are seen (in Cloudera Manager 5.x) under Clusters > General > Reports.

 

The first three reports there (Current Disk Usage by User | Group| Directory) will let you show how space usage looks right now ("now" being within the last ~hour).

 

The second group of three reports (Historical Disk Usage by User | Group| Directory) will let you specify date ranges and get granular down to per-hour disk usage growth, as far back as the information has been captured.

 

I imagine the Historical Disk Usage by Directory is, in the end, where you'll find the most relevant info, broken out by directory as you asked in your post's title. 

 

Hope this helps!

--

Mark Schnegelberger

Highlighted
Contributor
Posts: 40
Registered: ‎07-06-2018

Re: How can I see HDFS space consumption over time broken out by directory?

@smark 

 

Is there a CM API call to fetch this information . (Usage by directory), I'm aware of a CM API call which fetches it per user for a daily and weekly level. Any other to fetch it based on location/directory on a monthly basis?

 

thanks.