Support Questions

Find answers, ask questions, and share your expertise

Optimal way of defining HBASE column family

avatar
New Contributor

I have a use case scenario to store data in HBASE table and I would like to understand the optimal way of defining column family in HBASE table to reduce the number of get calls.

The scenario is

I will get an account number and I need to retrieve the customer detail and other account number associated to the customer. I’m thinking of defining the row with rowkey as acct & customer and column family with the account detail. one more row with customer id as row key and column family with array of account details.

Ex :

Row Rowkey column +cell

1 acct1|cust1 acct1 values

2 acct2| cust1 acct2 values

3 acct3| cust1 acct3 values

4 cust1 column family with array of accounts[ acct1,acct2,acct3]

Please advise the optimal way of defining the datamodel for this scenario.

1 ACCEPTED SOLUTION

avatar
Master Guru

Essentially column families should have the same keys. If you want to use two different keys you need two tables.

So I think you should have two tables, one keyed by account|cust as you say to find the customer info for an account

and a separate table that is

cust|account so you can easily drill down to a customer and find all the accounts associated with it. You can also do the second table with cust as key and then an array of accounts as you say but then you always need to update the list of accoiunts at a time. If you key the second table by cust|account you can freely add delete account rows for a customer and do a scan to get all accounts.

View solution in original post

1 REPLY 1

avatar
Master Guru

Essentially column families should have the same keys. If you want to use two different keys you need two tables.

So I think you should have two tables, one keyed by account|cust as you say to find the customer info for an account

and a separate table that is

cust|account so you can easily drill down to a customer and find all the accounts associated with it. You can also do the second table with cust as key and then an array of accounts as you say but then you always need to update the list of accoiunts at a time. If you key the second table by cust|account you can freely add delete account rows for a customer and do a scan to get all accounts.