Member since
05-13-2016
9
Posts
1
Kudos Received
0
Solutions
09-07-2016
08:41 AM
Thanks for confirming. The behaviour seems to match. The customer will have to revise the bulk loading procedures and rowkey design in order to have a more stable environment.
... View more
09-06-2016
09:44 PM
At the moment they deleted the table and started the new bulk load with the same frequency and the row keys. The region grew to 220 GB and the compactions were queueing up. The Splits are not triggered. The files that were loaded were around 120 MB in size, so there is a lot of files to compact. hbase.hregion.max.filesize is set to 10GB
... View more
09-06-2016
04:09 PM
Yes, I found this https://issues.apache.org/jira/browse/HBASE-12657 . In the ticket you can see the following: "Lowest sequence ID among all store files in a region is the reason that reference files are constantly getting removed from compaction selections if there are newer files in a compaction queue. This is what is happening under high load when there are too many minor compaction requests in a queue, reference files do not have a chance to be compacted. Interestingly, that current 0.94 and 0.98 code have different issues here and require different patches."
The HBase version in place is 1.1.11.x. The compaction queue usually holds around 60-80 entries.
... View more
09-06-2016
02:03 PM
1 Kudo
Hi all, we have a customer that is using HBase and has a pretty strange loading pattern. They use BulkLoad to load around 120 MB of data every 5-10 secs. The table is NOT pre-splitted and has 7 ColumnFamilies. Only 2-3 CFs are populated. What happens is that data goes into a single region initially and the region goes way beyond the split threshhold (10GB or R^2*flush size - they are using default split policy), I saw a region big as 2.2T with constant compactions that take 4-5 hrs. Also, RowKey is sequential which again casts a shadow on the application but the customer is reluctant to change anything. I am sure that even if the region was splitted they would have an issue with Hotspotting. Does the frequent BulkLoad in combination with sequential Rowkey, apart from being a terrible practice for Hbase, affect splitting? Any suggestions? Regards, Dino
... View more
Labels:
- Labels:
-
Apache HBase