We noticed a ridiculous improvement on a table's size after it's snapshot was exported to another cluster.
old-cluster - table is145.8 GB
new-cluster - table is 54.8 MB
The two clusters are configured similarly. Same number of region servers,etc. The numbers above reflect table size not including replication.
One of the reasons were are moving to the new-cluster is for addtional storage because our data is growing so quickly and we have found that HBASE requires 50% free space to do any compaction. I assume some if not all of the difference in table size is compaction but I'm surprised at the huge difference in the sizes and wondering if there is something we could do to improve the working cluster to avoid such wasted space. Note that we load data in bulk once a day. It is not updated real-time.
I realize this post is a little obtuse. My apologies.