Support Questions
Find answers, ask questions, and share your expertise

Best Apache Hbase Rowkey Design For Inventory Usecase

We are storing the Site-Article wise Inventory Stock and Sales in Hbase Table.
With rowkey is a combination of Site+Article.
There will be only one distinct record per Site-Article.
However this gives a poor performance while performing Hbase Scans.
If my Site=A,Article=100 my currenr rowkey design is A100.
My table contains 40 million records.
What can be a better rowkey design for faster scans in this usecase ?

1 REPLY 1

Cloudera Employee

Hello @prathamesh_h 

 

There is great documentation on rowkey design here: https://hbase.apache.org/book.html#rowkey.design

 

At a high level, you want to ensure that your rowkeys are as evenly distributed as possible. If you have very few sites and many articles for each site, you may not see great performance. 

 

You can consider ways to break your articles into smaller buckets within each site, and including this in your rowkey.