05-07-2014 12:02 AM
Would you happen to have succesful caes where you override Hbase Timestamps with your own timestamps? Throughout the documentation this is not advised, however we are importing historical data into Hbase.
I would be grateful if you have a step by step guide or could help us determine the consequences?
Does a hbase scan using a timerange query perform a full table scan?
05-07-2014 07:28 AM
In my experience, and as you have seen in the docs, it is generally not advisable to manually manipulate data timestamps as these are used in cell versioning by HBase. HBase assigns the current time in EPOCH seconds to any piece of data you insert into a table and by default HBase will keep multiple versions of each cell, so if you override a particular cell's value later, the updated value has a newer timestamp. This allows you to go back and retrieve older versions of that cell if you choose.
If you are manually overriding timestamps when you insert data and you have two copies of a particular cell (eg. one that has been persisted to an HFile already, and one that is only in memstore), and then try to read that cell, HBase will not know which value is current if the timestamps are the same.
You are better off using a column that stores the date/time of the data and make your application use that column as it's index for chronologically organizing the data. I wouldn't mess with the internal timestamps that HBase uses to organize it's data.