Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Significant HBASE table storage utilization reduction AFTER exportSnapshot

avatar
Explorer

We noticed a ridiculous improvement on a table's size after it's snapshot was exported to another cluster.

old-cluster - table is145.8 GB

new-cluster - table is 54.8 MB

 

The two clusters are configured similarly.  Same number of region servers,etc. The numbers above reflect table size not including replication.

 

One of the reasons were are moving to the new-cluster is for addtional storage because our data is growing so quickly and we have found that HBASE requires 50% free space to do any compaction.  I assume some if not all of the difference in table size is compaction but I'm surprised at the huge difference in the sizes and wondering if there is something we could do to improve the working cluster to avoid such wasted space.  Note that we load data in bulk once a day.  It is not updated real-time.

 

I realize this post is a little obtuse.  My apologies.

1 ACCEPTED SOLUTION

avatar
Explorer

Cloudera said I had a new ranking "idiot".  My apologies for not realizing this sooner.  I was doing a

hdfs dfs -du -h -s hdfs://server:8020/hbase/data/default/*

and expected to see the results of the "restored" snapshot when apparently it is only information pointing to the /hbase/archive/data/default.  So for anyone that is interested it looks like to find out the REAL table size and you are using snapshots you need to look in BOTH directories and add them together.

 

Cloudera didn't really call me that 🙂

View solution in original post

2 REPLIES 2

avatar
Explorer

Cloudera said I had a new ranking "idiot".  My apologies for not realizing this sooner.  I was doing a

hdfs dfs -du -h -s hdfs://server:8020/hbase/data/default/*

and expected to see the results of the "restored" snapshot when apparently it is only information pointing to the /hbase/archive/data/default.  So for anyone that is interested it looks like to find out the REAL table size and you are using snapshots you need to look in BOTH directories and add them together.

 

Cloudera didn't really call me that 🙂

avatar
Community Manager

Thanks for the laugh @keeblerh. Don't be so hard on yourself and thanks for sharing the solution. 🙂


Keep the questions coming,

Cy Jervis | Senior Manager, Knowledge Programs

if (helpful) { mark_as_solution(); } | if (appreciated) { give_kudos(); }