Reply
New Contributor
Posts: 3
Registered: ‎11-03-2014
Accepted Solution

Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Hi all,

When I use bulkload to load 2~3GB data into a table every 3 minutes, I see some regions suspend doing split/compaction(it may start split but doesn't finish after 1 hour). Eventually, there are some orphan HFiles on the HBase, and make some data loss. Does anyone know is it a bug if I doing bulkload too often? Thanks!

Platform: CDH4
Hbase version: 0.94
Hbase region size: 18 GB
Bulkload frequency: load 3GB data to a single table every 3 minutes

Posts: 416
Topics: 51
Kudos: 86
Solutions: 49
Registered: ‎06-26-2013

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Frequent bulk loads is not a good long-term use case for HBase...for the exact reasons you mentioned.  The compaction queue and data maintenance overhead just never catches up.

 

Can you either A) just stream the writes into HBase constantly using puts, or B) hold off on the bulk loads until the HFiles are full 18GB (eg. one region) in size?  At least reduce the compactions and splits?

New Contributor
Posts: 3
Registered: ‎11-03-2014

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

Thanks for your reply. I have revised the solution and write into HBase using puts. It works without problem mentioned before.
Highlighted
Explorer
Posts: 20
Registered: ‎07-29-2013

Re: Region stop doing split/compaction after bulkload data to HBase every 3 minutes

I agree with Clint, Bulk Loading into HBase every 3 minutes is too often and will cause a ton of compactions.  To remedy the splits you should have an overall understanding of what your data will look like 6 months - 1 year from now and pre-split the table upon creation.  This should give you enough regions to load all of your data without having to split everytime.  This is a best practice for puts as well.  Also with regards to Bulk Loading early versions of CDH4 had some issues with sequence numbers and I would advise moving to CDH 5.1.3.

Announcements