Support Questions
Find answers, ask questions, and share your expertise

HBase: All data stored in one region

Expert Contributor

I'm importing HFiles into HBase using the command:

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles my_table

When I just had a look into the HBase Master UI, I saw that all data seems to be stored on one region:


enter image description here

The HFiles were created by a Spark application, using this command:

JavaPairRDD<String, MyEntry> myPairRDD = ...
myPairRDD .repartitionAndSortWithinPartitions(new HashPartitioner(hbaseRegions));

Why is the data not splitted into all regions? What am I doing wrong?


Cloudera Employee

You need to implement salted rowkey to avoid the region hot spotting

If your planning to use phoenix for end user query

Better use phoenix api to insert data instead of hbase and please refer following doc on how to implement salting in phoenix

It seems your data is monotonically increasing and the keys for the data load belongs to a single region resulting in hot-spotting. This is a general problem with any key-value store if the rowkey is not chosen carefully.

If you don't have a row key which is non-monotonic or random in nature then you should look for hashing your key or salting(appending it with cyclic numbers although not recommended for point look-ups).

if you think this is happening during the initial load, then pre-split ( while creating a table or splitting the hot region after the first load is the option.

; ;