Support Questions

Find answers, ask questions, and share your expertise

HBase BulkLoad - Region Split behaviour

avatar
Explorer

Hi all,

we have a customer that is using HBase and has a pretty strange loading pattern.

They use BulkLoad to load around 120 MB of data every 5-10 secs. The table is NOT pre-splitted and has 7 ColumnFamilies. Only 2-3 CFs are populated. What happens is that data goes into a single region initially and the region goes way beyond the split threshhold (10GB or R^2*flush size - they are using default split policy), I saw a region big as 2.2T with constant compactions that take 4-5 hrs. Also, RowKey is sequential which again casts a shadow on the application but the customer is reluctant to change anything. I am sure that even if the region was splitted they would have an issue with Hotspotting.

Does the frequent BulkLoad in combination with sequential Rowkey, apart from being a terrible practice for Hbase, affect splitting? Any suggestions?

Regards,

Dino

1 ACCEPTED SOLUTION

avatar
Guru

A region will not split, if there is already a set of "reference files". A reference file is a soft link to "half of" some other hfile. When a region splits, there would be two daughter regions, which have these reference files, referring to the parent. This scheme ensures that region splits are quick, and never re-writes the entire region data during the split. A region will NEVER split if there are reference files lying around still referring to its parents.

Reference files normally get cleaned out by compaction. Once the compaction re-writes the file, the reference files will be deleted and the region can split again. If you use case bulk loads every 10 seconds or so, then there is probably A LOT Of small files being written and immense compaction pressure resulting in compactions not going through, and hence blocking further splits in the region.

I recommend reducing the frequency of bulk loads, optimizing compaction file limits to be higher and in general keeping the compaction queue not full.

View solution in original post

7 REPLIES 7

avatar
Super Guru

"What happens is that data goes into a single region initially and the region goes way beyond the split threshhold (10GB or R^2*flush size - they are using default split policy), I saw a region big as 2.2T with constant compactions that take 4-5 hrs."

This seems very bad. There should be back-pressure (e.g. max number of files or something) that prevents a region from growing this large without a split happening.

avatar
Explorer

Yes, I found this https://issues.apache.org/jira/browse/HBASE-12657 . In the ticket you can see the following:

"Lowest sequence ID among all store files in a region is the reason that reference files are constantly getting removed from compaction selections if there are newer files in a compaction queue. This is what is happening under high load when there are too many minor compaction requests in a queue, reference files do not have a chance to be compacted. Interestingly, that current 0.94 and 0.98 code have different issues here and require different patches."

The HBase version in place is 1.1.11.x.

The compaction queue usually holds around 60-80 entries.

avatar
Master Collaborator

Can you attach region server log for the server which hosts the single large region ?

There should be some clue in the region server log.

avatar
Master Collaborator

How many regions are there for this table ?

What's the value for hbase.hregion.max.filesize ?

Thanks

avatar
Explorer

At the moment they deleted the table and started the new bulk load with the same frequency and the row keys. The region grew to 220 GB and the compactions were queueing up. The Splits are not triggered. The files that were loaded were around 120 MB in size, so there is a lot of files to compact.

hbase.hregion.max.filesize is set to 10GB

avatar
Guru

A region will not split, if there is already a set of "reference files". A reference file is a soft link to "half of" some other hfile. When a region splits, there would be two daughter regions, which have these reference files, referring to the parent. This scheme ensures that region splits are quick, and never re-writes the entire region data during the split. A region will NEVER split if there are reference files lying around still referring to its parents.

Reference files normally get cleaned out by compaction. Once the compaction re-writes the file, the reference files will be deleted and the region can split again. If you use case bulk loads every 10 seconds or so, then there is probably A LOT Of small files being written and immense compaction pressure resulting in compactions not going through, and hence blocking further splits in the region.

I recommend reducing the frequency of bulk loads, optimizing compaction file limits to be higher and in general keeping the compaction queue not full.

avatar
Explorer

Thanks for confirming. The behaviour seems to match. The customer will have to revise the bulk loading procedures and rowkey design in order to have a more stable environment.