Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Timeseries data insertion using Phoenix

avatar
Contributor

Hi, I'm trying to insert timeseries data to HBase using Phoenix 4.4.0, I would need a row per device per day. I'm capturing data from devices every minute. So in a row per day 1440 records (columns) has to be stored. I have designed a schema as follows: create table devicedata (deviceId integer not null, day date not null, ts timestamp, val double CONSTRAINT my_pk PRIMARY KEY (deviceId, day)); But when I use upsert, it updates the values and in a row, I always have only the last value of the day. Help me design a schema using phoenix, which can store all 1440 data into a row.

1 ACCEPTED SOLUTION

avatar

Hi @Sasikumar Natarajan

You can include timestamp also part of primary key so you will have 1440 rows and also when you search by giving device id and date you can get all the 1440 records as usual. It will be fast also because it's going to be range query.

create table devicedata (deviceId integer not null, day date not null, ts timestamp, val double CONSTRAINT my_pk PRIMARY KEY (deviceId, day, timestamp))

Since it's time series data your regions might be bottle neck because continuous writes might go to single region. Then you you can use salt buckets to avoid it.

create table devicedata (deviceId integer not null, day date not null, ts timestamp, val double CONSTRAINT my_pk PRIMARY KEY (deviceId, day, timestamp)) SALT_BUCKETS=N.

View solution in original post

5 REPLIES 5

avatar

Hi @Sasikumar Natarajan

You can include timestamp also part of primary key so you will have 1440 rows and also when you search by giving device id and date you can get all the 1440 records as usual. It will be fast also because it's going to be range query.

create table devicedata (deviceId integer not null, day date not null, ts timestamp, val double CONSTRAINT my_pk PRIMARY KEY (deviceId, day, timestamp))

Since it's time series data your regions might be bottle neck because continuous writes might go to single region. Then you you can use salt buckets to avoid it.

create table devicedata (deviceId integer not null, day date not null, ts timestamp, val double CONSTRAINT my_pk PRIMARY KEY (deviceId, day, timestamp)) SALT_BUCKETS=N.

avatar
Contributor

Hi @Rajeshbabu Chintaguntla, Actually in HBase data will be inserted as columns using column family, But in Phoenix if we are inserting a row for each minute, row count will be huge soon. Does it affect performance. Do you have any links which explains how Phoenix translates data and store into HBase.

avatar

@Sasikumar Natarajan Phoenix translate upsert query into batch of KeyValues for each row. For each non primary key column we have a key value where value part will be column value in upsert query. All primary key column values will be combined as row key and it's the row part of the key value. As for your schema we have 1440 rows but for each row we have a keyvalue only. So it's better to have 1440 rows than 1440 columns for row or 1440 versions for row. Performance wise there won't be much difference.

avatar
Contributor

@Rajeshbabu Chintaguntla Thanks for the answer. I'll try this.

avatar

Don't you want to have 1440 rows? If you want to have 1440 records in a single row you need to have 1440 columns which is not good or you can mention number of versions to 1440 and then access all the versions from HBase which is also may not be good idea.