Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

Behavior

The number of cells returned to the client are normally filtered based on the table configuration; however, when using the RAW => true parameter, you can retrieve all of the versions kept by HBase, unless there was a major compaction or a flush to disk event in meanwhile.

Demonstration

Create a table with a single column family:

create 't1', 'f1' 

Configure it to retain a maximum version count of 3:

alter 't1',NAME=>'f1',VERSIONS=>3

Perform 4 puts:

put 't1','r1','f1:c1',1
put 't1','r1','f1:c1',2
put 't1','r1','f1:c1',3
put 't1','r1','f1:c1',4

Scan with RAW=>true. I used VERSIONS as 100 for a catch-all. It could have been anything greater than 3 (number of versions set previously). Unless specified, only the latest version is returned by scan command.

scan 't1',{RAW=>true,VERSIONS=>100}

The above scan returns all four versions.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479950685181, value=4 
r1  column=f1:c1,timestamp=1479950685155, value=3 
r1  column=f1:c1,timestamp=1479950685132, value=2 
r1  column=f1:c1,timestamp=1479950627736, value=1

Flush to disk:

flush ‘t1’

Then scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Do four more puts:

put 't1','r1','f1:c1',5
put 't1','r1','f1:c1',6
put 't1','r1','f1:c1',7
put 't1','r1','f1:c1',8

Flush to disk:

flush ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Six versions are returned:

ROW  COLUMN+CELL
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Force major compaction:

major_compact ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned:

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6

Conclusion

When deciding the number of versions to retain, it is best to treat that number as the minimum version count available at a given time and not as a given constant. Until a flush to disk and a major compaction, number of versions available is higher than the configured for the table.

646 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎11-24-2016 02:10 AM
Updated by:
 
Contributors
Top Kudoed Authors