Support Questions

Find answers, ask questions, and share your expertise

Is it possible for inconsistent read in Hbase with Memcache and BlockCache?

avatar
Rising Star

Give a scenario that data is written to HFile in Hbase. Now a read occurs and result is saved in blockcache. For the data that is there is in block cache an update occurs, which is saved in Memstore.

 

Now if the same data is read it will look for data first in Block Cache and if  cache is not yet expired the result is found there. If that data is returned to client then it would be an inconsistent read.

Is it possible for an above scenario to occur or is my understanding wrong?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @pacman HFile is only updated after flush of Memstore. In the scenario shared, Read Merge would share the Updated values without updating the Hfile. 

 

We can perform the same by following for verification:

 

1. Create a Table with 1 CF. Insert 1 Row & flush Table to ensure 1 Hfile is created. 
2. Read the Table, which would read from Hfile & place in BlockCache. The BlockCache Size is visible via HBase UI at the CF family level within the Table's Level Statistics,
3. Read the Table again, which would read from BlockCache. Verify via Hit Ratio from the BlockCache Stats in HBase UI. 
4. Update the Row by using the same RowKey, yet using a different Value for the Column Qualifier within the Column Family. 
5. Read the Table again. You should get the Updated Value from Step 4. 
6. To make things interesting, Remove the concerned Table's 1 Region hosting RegionServer WAL file & Kill the RegionServer PID. This ensure the MemStore isn't flushed owing to Ungraceful exit & WAL can't be replayed. 
7. Start the RegionServer, which shall create the WAL file. Read the Table again. The same would show Value from Step 1. 

 

This would help confirm the HFile isn't Updated when HBase Read Merge the Values from BlockCache & MemStore while reading the Table as per Step 5. 

 

Regards, Smarak

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Hello @pacman 

 

Thanks for using Cloudera Community. HBase handles such scenario by Read Merge. A Read "Merges" Key Values from the Block Cache, MemStore, and HFiles in the following steps:

 

  • First, the Scanner looks for the Row Cells in theBlock Cache (Read Cache). 
  • Next, the Scanner looks in the MemStore (Write Cache).
  • If the Scanner does not find all of the Row Cells in the MemStore and Block Cache, then Hfiles are referred. 

 

Hope the above answers your query. If Yes, Kindly mark the Post as Resolved. 

 

Regards, Smarak

avatar
Rising Star

@smdas  So even if a key is updated in Memcache and not updated in Blockcache  the read merge updates the values of BlockCache with Memcache directly without updating HFile? Cause for HFile to be updated a flush has to happen right? Or is it that MemCache and BlockCache checks are done simultaneously?

avatar
Super Collaborator

Hello @pacman HFile is only updated after flush of Memstore. In the scenario shared, Read Merge would share the Updated values without updating the Hfile. 

 

We can perform the same by following for verification:

 

1. Create a Table with 1 CF. Insert 1 Row & flush Table to ensure 1 Hfile is created. 
2. Read the Table, which would read from Hfile & place in BlockCache. The BlockCache Size is visible via HBase UI at the CF family level within the Table's Level Statistics,
3. Read the Table again, which would read from BlockCache. Verify via Hit Ratio from the BlockCache Stats in HBase UI. 
4. Update the Row by using the same RowKey, yet using a different Value for the Column Qualifier within the Column Family. 
5. Read the Table again. You should get the Updated Value from Step 4. 
6. To make things interesting, Remove the concerned Table's 1 Region hosting RegionServer WAL file & Kill the RegionServer PID. This ensure the MemStore isn't flushed owing to Ungraceful exit & WAL can't be replayed. 
7. Start the RegionServer, which shall create the WAL file. Read the Table again. The same would show Value from Step 1. 

 

This would help confirm the HFile isn't Updated when HBase Read Merge the Values from BlockCache & MemStore while reading the Table as per Step 5. 

 

Regards, Smarak

avatar
Rising Star

@smdas  Adding another scenario to picture. lets say a row key is already in blockcache. An update for that was just made. The row key has an existing value already in blockcache which is not the latest but  the updated value is in memstore and  if flushed then in Hfile. When a read occurs for the same row key then we look for the data first in blockcache and it will find a row key with old value. How does hbase know the value blockcache currently holds is not the latest and latest has to be fetched from Memstore or Hfile?  

avatar
Super Collaborator

Hi @pacman A Merge happens for each Read Operation i.e. BlockCache & MemStore. As such, Incorrect Values aren't observed. Having said that, If you observe any such scenario of Read/Write Inconsistency, Kindly share a Use-Case & any replication attempt to allow us to review accordingly.

 

Regards, Smarak