Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Super Guru

Behavior

The number of cells returned to the client are normally filtered based on the table configuration; however, when using the RAW => true parameter, you can retrieve all of the versions kept by HBase, unless there was a major compaction or a flush to disk event in meanwhile.

Demonstration

Create a table with a single column family:

create 't1', 'f1' 

Configure it to retain a maximum version count of 3:

alter 't1',NAME=>'f1',VERSIONS=>3

Perform 4 puts:

put 't1','r1','f1:c1',1
put 't1','r1','f1:c1',2
put 't1','r1','f1:c1',3
put 't1','r1','f1:c1',4

Scan with RAW=>true. I used VERSIONS as 100 for a catch-all. It could have been anything greater than 3 (number of versions set previously). Unless specified, only the latest version is returned by scan command.

scan 't1',{RAW=>true,VERSIONS=>100}

The above scan returns all four versions.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479950685181, value=4 
r1  column=f1:c1,timestamp=1479950685155, value=3 
r1  column=f1:c1,timestamp=1479950685132, value=2 
r1  column=f1:c1,timestamp=1479950627736, value=1

Flush to disk:

flush ‘t1’

Then scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned.

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Do four more puts:

put 't1','r1','f1:c1',5
put 't1','r1','f1:c1',6
put 't1','r1','f1:c1',7
put 't1','r1','f1:c1',8

Flush to disk:

flush ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Six versions are returned:

ROW  COLUMN+CELL
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6 
r1  column=f1:c1,timestamp=1479952079260, value=4 
r1  column=f1:c1,timestamp=1479952079234, value=3 
r1  column=f1:c1,timestamp=1479952079209, value=2

Force major compaction:

major_compact ‘t1’

Scan:

scan 't1',{RAW=>true,VERSIONS=>100}

Three versions are returned:

ROW  COLUMN+CELL 
r1  column=f1:c1,timestamp=1479952349970, value=8 
r1  column=f1:c1,timestamp=1479952349925, value=7 
r1  column=f1:c1,timestamp=1479952349895, value=6

Conclusion

When deciding the number of versions to retain, it is best to treat that number as the minimum version count available at a given time and not as a given constant. Until a flush to disk and a major compaction, number of versions available is higher than the configured for the table.

1,482 Views