Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can I see HDFS space consumption over time broken out by directory?

How can I see HDFS space consumption over time broken out by directory?

Explorer

We've been loading a lot of data into our cluster this week and I'd like to see where the growth is coming from. What's the best way to get at this information?

2 REPLIES 2

Re: How can I see HDFS space consumption over time broken out by directory?

Expert Contributor

The Reports Manager is your go-to for historical disk reporting.

 

Every hour, the Reports Manager grabs the HDFS fsimage and indexes its paths and space usage. These hourly reports are seen (in Cloudera Manager 5.x) under Clusters > General > Reports.

 

The first three reports there (Current Disk Usage by User | Group| Directory) will let you show how space usage looks right now ("now" being within the last ~hour).

 

The second group of three reports (Historical Disk Usage by User | Group| Directory) will let you specify date ranges and get granular down to per-hour disk usage growth, as far back as the information has been captured.

 

I imagine the Historical Disk Usage by Directory is, in the end, where you'll find the most relevant info, broken out by directory as you asked in your post's title. 

 

Hope this helps!

--

Mark Schnegelberger

Re: How can I see HDFS space consumption over time broken out by directory?

Rising Star

@smark 

 

Is there a CM API call to fetch this information . (Usage by directory), I'm aware of a CM API call which fetches it per user for a daily and weekly level. Any other to fetch it based on location/directory on a monthly basis?

 

thanks.