I observe a strange phenomenon of data compression degradation once data grow above a certain point.
My data consists of 9 bytes keys (having one salt byte at the beginning) and 4 bytes values. Data ingestion rate is about 48 mln inserts per 8 minutes with infrequently changing key/value pairs - only about 2% of data changes between consecutive iterations. The table has 1 column family and uses encoding: FAST_DIFF, compression: GZ, and TTL: 30 days.
After the first 10 days, data compression looks very well having a compression ratio reported by HBase on average less than 0.1 and occupying about 100GB of space. Things start to change after about day 15 where the compression ratio starts to grow slowly, reaching about 0.75 at day 30 and occupying about 2TB (instead of 300GB as data from day 10 may suggest).
HBase version: 1.2.0-cdh5.9.0
Has anyone encountered a similar problem before? If so, what's the root cause, and how to handle it?