Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Master Guru

Often a timestamp is found within a phoenix table to support various business taxonomy requirements. Generally I have seen a pattern simply adding a timestamp datatype field. Many may not realize part of the hbase data model rowkey design is the timestamp. When a cell is written in hbase by default it is versioned using timestamp when the cell was created. This is out of the box.

Apache Phoenix provides a way of mapping HBase’s native row timestamp to a Phoenix column. This leverages various optimizations which HBase provides for time ranges on the store files as well as various query optimization capabilities built within Phoenix. - https://phoenix.apache.org/rowtimestamp.html

Based on your use case, set this "column" to a value of your liking. Take advantage of the built in data model design without adding yet another timestamp field. Commonly I find many end up creating secondary index on these additional timestamp fields due to time always being a variable at query time.

Lets take a look at a simple data model. We have customer entity with various attributes.

6970-datamodel.jpg

Now typically you would see the following create table statement on phoenix:

CREATE TABLE IF NOT EXISTS customer (
 firstName VARCHAR, 
 lastName VARCHAR,
 address VARCHAR,
 ssn integer NOT NULL,
 effective_date TIMESTAMP,
 ACTIVE_IND CHAR(1)
 CONSTRAINT pk PRIMARY KEY (ssn) ) KEEP_DELETED_CELLS=false;

Often once table is created/populated SQL queries start pouring in. Soon a pattern is established where effective_date is most commonly used during query time. Thereafter DBA would create a secondary index.

 CREATE INDEX my_idx ON CUSTOMER(effective_date DESC);

Now determine where this needs to be global or local index. I won't go into details about that now. However my point is there may be a easier way. Leverage the rowkey timestamp!

Instead of creating additional column this time I will assign effective_date to the rowkey timestamp. The rowkey timestamp is baked into the hbase data model. This is how it is done:

CREATE TABLE IF NOT EXISTS customer (
 firstName VARCHAR, 
 lastName VARCHAR,
 address VARCHAR,
 ssn integer NOT NULL,
 effective_date TIMESTAMP NOT NULL,
 ACTIVE_IND CHAR(1)
 CONSTRAINT pk PRIMARY KEY (ssn, effective_date ROW_TIMESTAMP) ) KEEP_DELETED_CELLS=false;

Now you can start querying customer table using the effective_date within your queries avoiding secondary index. There may be use cases where secondary index may make more sense then leveraging the core rowkey timestamp. Your use cases will drive that decision. The flexibility is there and you have choices.

3,818 Views
Comments
avatar
New Contributor

One problem.
SQLException: ERROR 1088 (44A19): Cannot create an index on a mutable table that has a ROW_TIMESTAMP column. tableName=<blah>

So if you pick this strategy you can only index your table if you set rows as immutable.