Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Super Guru

Often a timestamp is found within a phoenix table to support various business taxonomy requirements. Generally I have seen a pattern simply adding a timestamp datatype field. Many may not realize part of the hbase data model rowkey design is the timestamp. When a cell is written in hbase by default it is versioned using timestamp when the cell was created. This is out of the box.

Apache Phoenix provides a way of mapping HBase’s native row timestamp to a Phoenix column. This leverages various optimizations which HBase provides for time ranges on the store files as well as various query optimization capabilities built within Phoenix. - https://phoenix.apache.org/rowtimestamp.html

Based on your use case, set this "column" to a value of your liking. Take advantage of the built in data model design without adding yet another timestamp field. Commonly I find many end up creating secondary index on these additional timestamp fields due to time always being a variable at query time.

Lets take a look at a simple data model. We have customer entity with various attributes.

6970-datamodel.jpg

Now typically you would see the following create table statement on phoenix:

CREATE TABLE IF NOT EXISTS customer (
 firstName VARCHAR, 
 lastName VARCHAR,
 address VARCHAR,
 ssn integer NOT NULL,
 effective_date TIMESTAMP,
 ACTIVE_IND CHAR(1)
 CONSTRAINT pk PRIMARY KEY (ssn) ) KEEP_DELETED_CELLS=false;

Often once table is created/populated SQL queries start pouring in. Soon a pattern is established where effective_date is most commonly used during query time. Thereafter DBA would create a secondary index.

 CREATE INDEX my_idx ON CUSTOMER(effective_date DESC);

Now determine where this needs to be global or local index. I won't go into details about that now. However my point is there may be a easier way. Leverage the rowkey timestamp!

Instead of creating additional column this time I will assign effective_date to the rowkey timestamp. The rowkey timestamp is baked into the hbase data model. This is how it is done:

CREATE TABLE IF NOT EXISTS customer (
 firstName VARCHAR, 
 lastName VARCHAR,
 address VARCHAR,
 ssn integer NOT NULL,
 effective_date TIMESTAMP NOT NULL,
 ACTIVE_IND CHAR(1)
 CONSTRAINT pk PRIMARY KEY (ssn, effective_date ROW_TIMESTAMP) ) KEEP_DELETED_CELLS=false;

Now you can start querying customer table using the effective_date within your queries avoiding secondary index. There may be use cases where secondary index may make more sense then leveraging the core rowkey timestamp. Your use cases will drive that decision. The flexibility is there and you have choices.

2,059 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 10:36 AM
Updated by:
 
Contributors
Top Kudoed Authors