Created 09-06-2016 02:03 PM
Hi all,
we have a customer that is using HBase and has a pretty strange loading pattern.
They use BulkLoad to load around 120 MB of data every 5-10 secs. The table is NOT pre-splitted and has 7 ColumnFamilies. Only 2-3 CFs are populated. What happens is that data goes into a single region initially and the region goes way beyond the split threshhold (10GB or R^2*flush size - they are using default split policy), I saw a region big as 2.2T with constant compactions that take 4-5 hrs. Also, RowKey is sequential which again casts a shadow on the application but the customer is reluctant to change anything. I am sure that even if the region was splitted they would have an issue with Hotspotting.
Does the frequent BulkLoad in combination with sequential Rowkey, apart from being a terrible practice for Hbase, affect splitting? Any suggestions?
Regards,
Dino
Created 09-06-2016 09:50 PM
A region will not split, if there is already a set of "reference files". A reference file is a soft link to "half of" some other hfile. When a region splits, there would be two daughter regions, which have these reference files, referring to the parent. This scheme ensures that region splits are quick, and never re-writes the entire region data during the split. A region will NEVER split if there are reference files lying around still referring to its parents.
Reference files normally get cleaned out by compaction. Once the compaction re-writes the file, the reference files will be deleted and the region can split again. If you use case bulk loads every 10 seconds or so, then there is probably A LOT Of small files being written and immense compaction pressure resulting in compactions not going through, and hence blocking further splits in the region.
I recommend reducing the frequency of bulk loads, optimizing compaction file limits to be higher and in general keeping the compaction queue not full.
Created 09-06-2016 03:08 PM
"What happens is that data goes into a single region initially and the region goes way beyond the split threshhold (10GB or R^2*flush size - they are using default split policy), I saw a region big as 2.2T with constant compactions that take 4-5 hrs."
This seems very bad. There should be back-pressure (e.g. max number of files or something) that prevents a region from growing this large without a split happening.
Created 09-06-2016 04:09 PM
Yes, I found this https://issues.apache.org/jira/browse/HBASE-12657 . In the ticket you can see the following:
"Lowest sequence ID among all store files in a region is the reason that reference files are constantly getting removed from compaction selections if there are newer files in a compaction queue. This is what is happening under high load when there are too many minor compaction requests in a queue, reference files do not have a chance to be compacted. Interestingly, that current 0.94 and 0.98 code have different issues here and require different patches."
The HBase version in place is 1.1.11.x.
The compaction queue usually holds around 60-80 entries.
Created 09-06-2016 06:09 PM
Can you attach region server log for the server which hosts the single large region ?
There should be some clue in the region server log.
Created 09-06-2016 06:10 PM
How many regions are there for this table ?
What's the value for hbase.hregion.max.filesize ?
Thanks
Created 09-06-2016 09:44 PM
At the moment they deleted the table and started the new bulk load with the same frequency and the row keys. The region grew to 220 GB and the compactions were queueing up. The Splits are not triggered. The files that were loaded were around 120 MB in size, so there is a lot of files to compact.
hbase.hregion.max.filesize is set to 10GB
Created 09-06-2016 09:50 PM
A region will not split, if there is already a set of "reference files". A reference file is a soft link to "half of" some other hfile. When a region splits, there would be two daughter regions, which have these reference files, referring to the parent. This scheme ensures that region splits are quick, and never re-writes the entire region data during the split. A region will NEVER split if there are reference files lying around still referring to its parents.
Reference files normally get cleaned out by compaction. Once the compaction re-writes the file, the reference files will be deleted and the region can split again. If you use case bulk loads every 10 seconds or so, then there is probably A LOT Of small files being written and immense compaction pressure resulting in compactions not going through, and hence blocking further splits in the region.
I recommend reducing the frequency of bulk loads, optimizing compaction file limits to be higher and in general keeping the compaction queue not full.
Created 09-07-2016 08:41 AM
Thanks for confirming. The behaviour seems to match. The customer will have to revise the bulk loading procedures and rowkey design in order to have a more stable environment.