I'm importing HFiles into HBase using the command:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles my_table
When I just had a look into the HBase Master UI, I saw that all data seems to be stored on one region:
The HFiles were created by a Spark application, using this command:
JavaPairRDD<String, MyEntry> myPairRDD = ... myPairRDD .repartitionAndSortWithinPartitions(new HashPartitioner(hbaseRegions));
Why is the data not splitted into all regions? What am I doing wrong?
You need to implement salted rowkey to avoid the region hot spotting
If your planning to use phoenix for end user query
Better use phoenix api to insert data instead of hbase and please refer following doc on how to implement salting in phoenix
It seems your data is monotonically increasing and the keys for the data load belongs to a single region resulting in hot-spotting. This is a general problem with any key-value store if the rowkey is not chosen carefully.
If you don't have a row key which is non-monotonic or random in nature then you should look for hashing your key or salting(appending it with cyclic numbers although not recommended for point look-ups).
if you think this is happening during the initial load, then pre-split (https://hbase.apache.org/book.html#_shell_tricks) while creating a table or splitting the hot region after the first load is the option.