Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Labels (1)

Behavior

The number of cells returned to the client are normally filtered based on the table configuration; however, when using the RAW => true parameter, you can retrieve all of the versions kept by HBase, unless there was a major compaction or a flush to disk event in meanwhile.

Demonstration

Create a table with a single column family:

create 't1', 'f1' 

Configure it to retain a maximum version count of 3:

alter 't1',NAME=>'f1',VERSIONS=>3

Perform 4 puts:

put 't1','r1','f1:c1',1
put 't1','r1','f1:c1',2
put 't1','r1','f1:c1',3
put 't1','r1','f1:c1',4

Scan with RAW=>true. I used VERSIONS as 100 for a catch-all. It could have been anything greater than 3 (number of versions set previously). Unless specified, only the latest version is returned by scan command.

scan 't1',{RAW=>true,VERSIONS=>100}

The above scan returns all four versions.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479950685181, value=4 
r1  column=f1:c1,timestamp=1479950685155, value=3 
r1  column=f1:c1,timestamp=1479950685132, value=2 
r1  column=f1:c1,timestamp=1479950627736, value=1

Flush to disk:

flush ‘t1’

Then scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Do four more puts:

put 't1','r1','f1:c1',5
put 't1','r1','f1:c1',6
put 't1','r1','f1:c1',7
put 't1','r1','f1:c1',8

Flush to disk:

flush ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Six versions are returned:

ROW  COLUMN+CELL
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Force major compaction:

major_compact ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned:

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6

Conclusion

When deciding the number of versions to retain, it is best to treat that number as the minimum version count available at a given time and not as a given constant. Until a flush to disk and a major compaction, number of versions available is higher than the configured for the table.

1,338 Views
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎11-24-2016 02:10 AM
Updated by:
Contributors
Top Kudoed Authors