Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Solved Go to solution

Region stop doing split/compaction after bulkload data to HBase every 3 minutes

New Contributor

Hi all,

When I use bulkload to load 2~3GB data into a table every 3 minutes, I see some regions suspend doing split/compaction(it may start split but doesn't finish after 1 hour). Eventually, there are some orphan HFiles on the HBase, and make some data loss. Does anyone know is it a bug if I doing bulkload too often? Thanks!

Platform: CDH4
Hbase version: 0.94
Hbase region size: 18 GB
Bulkload frequency: load 3GB data to a single table every 3 minutes

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Master Collaborator

Frequent bulk loads is not a good long-term use case for HBase...for the exact reasons you mentioned.  The compaction queue and data maintenance overhead just never catches up.

 

Can you either A) just stream the writes into HBase constantly using puts, or B) hold off on the bulk loads until the HFiles are full 18GB (eg. one region) in size?  At least reduce the compactions and splits?

3 REPLIES 3

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Master Collaborator

Frequent bulk loads is not a good long-term use case for HBase...for the exact reasons you mentioned.  The compaction queue and data maintenance overhead just never catches up.

 

Can you either A) just stream the writes into HBase constantly using puts, or B) hold off on the bulk loads until the HFiles are full 18GB (eg. one region) in size?  At least reduce the compactions and splits?

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

New Contributor
Thanks for your reply. I have revised the solution and write into HBase using puts. It works without problem mentioned before.

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Explorer

I agree with Clint, Bulk Loading into HBase every 3 minutes is too often and will cause a ton of compactions.  To remedy the splits you should have an overall understanding of what your data will look like 6 months - 1 year from now and pre-split the table upon creation.  This should give you enough regions to load all of your data without having to split everytime.  This is a best practice for puts as well.  Also with regards to Bulk Loading early versions of CDH4 had some issues with sequence numbers and I would advise moving to CDH 5.1.3.