Reply
Explorer
Posts: 9
Registered: ‎11-12-2018
Accepted Solution

Where did my space on Hadoop cluster go?

I've a very weired issue, where my hadoop cluster has run out of space. Upon investagtion I found out that one of the database was consuming about 77 TB of space. However when I  go inside the directory the total space consumed by all tables is about 5TB. So what is consuming the rest of the space or where did it go?

 

I'm finding space using the following command:

 

hadoop fs -du -h /user/hive/warehouse

 

My cloudera manager is 5.13

Highlighted
Posts: 519
Topics: 14
Kudos: 91
Solutions: 45
Registered: ‎09-02-2016

Re: Where did my space on Hadoop cluster go?

@orak

 

Are you using Cloudera Enterprise by any chance? if so, you can generate report from CM -> Clusters (top menu) -> Reports -> Directory usage 

 

For more details, pls refer

https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_dg_disk_usage_reports.html#cmug_t...

 

Master
Posts: 402
Registered: ‎07-01-2015

Re: Where did my space on Hadoop cluster go?

Try to check your trash directories as well, it can consume quite a lot of spaces.
Explorer
Posts: 9
Registered: ‎11-12-2018

Re: Where did my space on Hadoop cluster go?

Yes I'm using Enterprise, and I'm not sure why would the report from CM be any different than the one reported at the command line. I've checked the report though, and it also says the same
Posts: 1,001
Topics: 1
Kudos: 249
Solutions: 126
Registered: ‎04-22-2014

Re: Where did my space on Hadoop cluster go?

@orak,

 

One thing that would help us provide some more suggestions is to understand the following:

 

  • How you came to know that your "hadoop cluster ran out of space".  What did you see exactly that told you there was a problem?
  • What did you run to see that a database was using 77TB?  What was the ouptput?
  • What command did you run to see that only 5TB was of table data was taken?  What was the output?

 

Explorer
Posts: 9
Registered: ‎11-12-2018

Re: Where did my space on Hadoop cluster go?

So the problem was with Snapshots. I had configured snapshots a long time ago on the /user/hive/warehouse directory, and they were still being generated. 

 

I was finding the space using the commands

hadoop fs -du -h /user/hive

hadoop fs -du -h /user/hive/warehouse

 

Snapshot directories can be found using command:

hdfs lsSnapshottabledir

hadoop fs -delteSnapshot <path without .snapshot> <snapshotname>

Announcements