Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase schema for time-series

HBase schema for time-series

Rising Star

Looking to store time-series data in HBase, I am open to additions such as Phoenix or OpenTSDB, but primarily interested in storing the data in time order for fast retrieval. The below is a start but my main issue is that it doesn't allow for multiple data points at the same time.

With the below schema upserting the same sym and time causes a collision and only 1 row is stored.

8036-phoenixcreate.png

1 REPLY 1
Highlighted

Re: HBase schema for time-series

hello kirk

monolithically increasing keys are always an issue from a distribution perspective in Hbase. In you scenario sym in front would help creating logical splits that could increase performance and probably avoid some headache.

In the same data point element, Hbase can hold multiple versions of the same cell so as it may look like you loose the first one you actually do not. Granted Phoenix does not yet have a nice an easy way to query a specific version of a cell. An other way would be to have your key have sym and then a range and store in the column the real time stamp and the qty in the value. Columns can be what you want and do not have to be specified up front at least in the hbase world. In Phoenix dynamic columns get halfway there or an array approach could make more sense.

What is your intended way of querying and presenting the data?