Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Significant HBASE table storage utilization reduction AFTER exportSnapshot

avatar
Explorer

We noticed a ridiculous improvement on a table's size after it's snapshot was exported to another cluster.

old-cluster - table is145.8 GB

new-cluster - table is 54.8 MB

 

The two clusters are configured similarly.  Same number of region servers,etc. The numbers above reflect table size not including replication.

 

One of the reasons were are moving to the new-cluster is for addtional storage because our data is growing so quickly and we have found that HBASE requires 50% free space to do any compaction.  I assume some if not all of the difference in table size is compaction but I'm surprised at the huge difference in the sizes and wondering if there is something we could do to improve the working cluster to avoid such wasted space.  Note that we load data in bulk once a day.  It is not updated real-time.

 

I realize this post is a little obtuse.  My apologies.

1 ACCEPTED SOLUTION

avatar
Explorer

Cloudera said I had a new ranking "idiot".  My apologies for not realizing this sooner.  I was doing a

hdfs dfs -du -h -s hdfs://server:8020/hbase/data/default/*

and expected to see the results of the "restored" snapshot when apparently it is only information pointing to the /hbase/archive/data/default.  So for anyone that is interested it looks like to find out the REAL table size and you are using snapshots you need to look in BOTH directories and add them together.

 

Cloudera didn't really call me that 🙂

View solution in original post

2 REPLIES 2

avatar
Explorer

Cloudera said I had a new ranking "idiot".  My apologies for not realizing this sooner.  I was doing a

hdfs dfs -du -h -s hdfs://server:8020/hbase/data/default/*

and expected to see the results of the "restored" snapshot when apparently it is only information pointing to the /hbase/archive/data/default.  So for anyone that is interested it looks like to find out the REAL table size and you are using snapshots you need to look in BOTH directories and add them together.

 

Cloudera didn't really call me that 🙂

avatar
Community Manager

Thanks for the laugh @keeblerh. Don't be so hard on yourself and thanks for sharing the solution. 🙂


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.