Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase data compression problem

Highlighted

HBase data compression problem

New Contributor

I observe a strange phenomenon of data compression degradation once data grow above a certain point.


My data consists of 9 bytes keys (having one salt byte at the beginning) and 4 bytes values.
Data ingestion rate is about 48 mln inserts per 8 minutes with infrequently changing key/value pairs - only about 2% of data changes between consecutive iterations. The table has 1 column family and uses encoding: FAST_DIFF, compression: GZ, and TTL: 30 days.

After the first 10 days, data compression looks very well having a compression ratio reported by HBase on average less than 0.1 and occupying about 100GB of space.
Things start to change after about day 15 where the compression ratio starts to grow slowly, reaching about 0.75 at day 30 and occupying about 2TB (instead of 300GB as data from day 10 may suggest).

HBase version: 1.2.0-cdh5.9.0

 

Has anyone encountered a similar problem before? If so, what's the root cause, and how to handle it?

Don't have an account?
Coming from Hortonworks? Activate your account here