Support Questions

Find answers, ask questions, and share your expertise

Where did my space on Hadoop cluster go?

avatar
Explorer

I've a very weired issue, where my hadoop cluster has run out of space. Upon investagtion I found out that one of the database was consuming about 77 TB of space. However when I  go inside the directory the total space consumed by all tables is about 5TB. So what is consuming the rest of the space or where did it go?

 

I'm finding space using the following command:

 

hadoop fs -du -h /user/hive/warehouse

 

My cloudera manager is 5.13

1 ACCEPTED SOLUTION

avatar
Explorer

So the problem was with Snapshots. I had configured snapshots a long time ago on the /user/hive/warehouse directory, and they were still being generated. 

 

I was finding the space using the commands

hadoop fs -du -h /user/hive

hadoop fs -du -h /user/hive/warehouse

 

Snapshot directories can be found using command:

hdfs lsSnapshottabledir

hadoop fs -delteSnapshot <path without .snapshot> <snapshotname>

View solution in original post

5 REPLIES 5

avatar
Champion

@orak

 

Are you using Cloudera Enterprise by any chance? if so, you can generate report from CM -> Clusters (top menu) -> Reports -> Directory usage 

 

For more details, pls refer

https://www.cloudera.com/documentation/enterprise/5-13-x/topics/cm_dg_disk_usage_reports.html#cmug_t...

 

avatar
Explorer
Yes I'm using Enterprise, and I'm not sure why would the report from CM be any different than the one reported at the command line. I've checked the report though, and it also says the same

avatar
Master Guru

@orak,

 

One thing that would help us provide some more suggestions is to understand the following:

 

  • How you came to know that your "hadoop cluster ran out of space".  What did you see exactly that told you there was a problem?
  • What did you run to see that a database was using 77TB?  What was the ouptput?
  • What command did you run to see that only 5TB was of table data was taken?  What was the output?

 

avatar
Explorer

So the problem was with Snapshots. I had configured snapshots a long time ago on the /user/hive/warehouse directory, and they were still being generated. 

 

I was finding the space using the commands

hadoop fs -du -h /user/hive

hadoop fs -du -h /user/hive/warehouse

 

Snapshot directories can be found using command:

hdfs lsSnapshottabledir

hadoop fs -delteSnapshot <path without .snapshot> <snapshotname>

avatar
Try to check your trash directories as well, it can consume quite a lot of spaces.