Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase schema design for complex data

avatar
Contributor

I would like some advice about the HBase schema design.

For example, there are 2000 patients,

1. Each patient has a name, sex, age, hospital_ID.

2. Each patient will be recorded activity data such as heart bits, location and steps every minute.

3. Each patient will take several questionnaires.

how to organise the HBase table?

My current idea is to use the patient_ID as the row key. each patient will have only one row in the HBase table. But, all activity data will be grouped in the nested table. The activity data table will have millions of rows.

So, the table will have three column families.

CF1:info,

CF2:activity_data,

CF3:questionnaires.

Then,

CF1:info includes (name, sex, age, ID)

CF2:activity_data (data(a nested table))

CF3:questionnaires (questionnaired_ID (a nested table))

I don't know whether this is a smart way to design the HBase schema.

Please provide me with some advice.

Thank you very much

1 ACCEPTED SOLUTION

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
10 REPLIES 10

avatar
Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

Hi Greg,

Many thanks for your reply.

I have one question about the column family and column design.

If I have a column family called info which stores users name, age, sex and so on. And another column family store a time series activity data. for example, each patient will record 1 billion times for all 5 different features. Then, the column family "info" will be duplicated 1 billion times.

How could i avoid this problem?

Many thanks for your help in advance

avatar
Guru

This is where Phoenix will be quite useful: it is a SQL interface to HBase and does joins with other tables (but they should be simple joins ... usually just one table). So you can have one table with both patient_id and time forming a composite key with columns holding data on that patient that changes with time. Then you can have a lookup table with patient_id as key and join the two whenever you need to. The link in the original answer with the slide share shows you how to build these tables. This may help also: https://phoenix.apache.org/joins.html

I am not sure of your specific query patterns but based on the info given, this should solve your problem.

avatar
Contributor

Many thanks for your help.

you really do me a big favour

I will have a look of the materials which you provided first.

thanks,

avatar
Guru

@Bin Ye ... how is your follow-up going?

avatar
Contributor

@Greg Keys Hi, thanks for asking. I decided to split the table and store the data separately. Thanks.

avatar
Guru

@Bin YeIf you found the answer useful, please accept or upvote ... that is how the community works 🙂

avatar
Contributor

thanks, i accpeted and voted

avatar
Guru

Hi @Bin Ye Keep posting (questions, answers, articles) and sharing your experience ... everyone in the community benefits 🙂