Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Super Guru

Behavior

The number of cells returned to the client are normally filtered based on the table configuration; however, when using the RAW => true parameter, you can retrieve all of the versions kept by HBase, unless there was a major compaction or a flush to disk event in meanwhile.

Demonstration

Create a table with a single column family:

create 't1', 'f1' 

Configure it to retain a maximum version count of 3:

alter 't1',NAME=>'f1',VERSIONS=>3

Perform 4 puts:

put 't1','r1','f1:c1',1
put 't1','r1','f1:c1',2
put 't1','r1','f1:c1',3
put 't1','r1','f1:c1',4

Scan with RAW=>true. I used VERSIONS as 100 for a catch-all. It could have been anything greater than 3 (number of versions set previously). Unless specified, only the latest version is returned by scan command.

scan 't1',{RAW=>true,VERSIONS=>100}

The above scan returns all four versions.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479950685181, value=4 
r1  column=f1:c1,timestamp=1479950685155, value=3 
r1  column=f1:c1,timestamp=1479950685132, value=2 
r1  column=f1:c1,timestamp=1479950627736, value=1

Flush to disk:

flush ‘t1’

Then scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Do four more puts:

put 't1','r1','f1:c1',5
put 't1','r1','f1:c1',6
put 't1','r1','f1:c1',7
put 't1','r1','f1:c1',8

Flush to disk:

flush ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Six versions are returned:

ROW  COLUMN+CELL
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Force major compaction:

major_compact ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned:

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6

Conclusion

When deciding the number of versions to retain, it is best to treat that number as the minimum version count available at a given time and not as a given constant. Until a flush to disk and a major compaction, number of versions available is higher than the configured for the table.

1,898 Views
Version history
Last update:
‎11-24-2016 02:10 AM
Updated by:
Contributors