- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is it possible for inconsistent read in Hbase with Memcache and BlockCache?
- Labels:
-
Apache HBase
-
HDFS
Created on
‎12-08-2022
02:38 AM
- last edited on
‎12-08-2022
11:11 PM
by
VidyaSargur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Give a scenario that data is written to HFile in Hbase. Now a read occurs and result is saved in blockcache. For the data that is there is in block cache an update occurs, which is saved in Memstore.
Now if the same data is read it will look for data first in Block Cache and if cache is not yet expired the result is found there. If that data is returned to client then it would be an inconsistent read.
Is it possible for an above scenario to occur or is my understanding wrong?
Created on ‎12-13-2022 07:39 AM - edited ‎12-13-2022 07:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @pacman HFile is only updated after flush of Memstore. In the scenario shared, Read Merge would share the Updated values without updating the Hfile.
We can perform the same by following for verification:
1. Create a Table with 1 CF. Insert 1 Row & flush Table to ensure 1 Hfile is created.
2. Read the Table, which would read from Hfile & place in BlockCache. The BlockCache Size is visible via HBase UI at the CF family level within the Table's Level Statistics,
3. Read the Table again, which would read from BlockCache. Verify via Hit Ratio from the BlockCache Stats in HBase UI.
4. Update the Row by using the same RowKey, yet using a different Value for the Column Qualifier within the Column Family.
5. Read the Table again. You should get the Updated Value from Step 4.
6. To make things interesting, Remove the concerned Table's 1 Region hosting RegionServer WAL file & Kill the RegionServer PID. This ensure the MemStore isn't flushed owing to Ungraceful exit & WAL can't be replayed.
7. Start the RegionServer, which shall create the WAL file. Read the Table again. The same would show Value from Step 1.
This would help confirm the HFile isn't Updated when HBase Read Merge the Values from BlockCache & MemStore while reading the Table as per Step 5.
Regards, Smarak
Created ‎12-09-2022 07:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @pacman
Thanks for using Cloudera Community. HBase handles such scenario by Read Merge. A Read "Merges" Key Values from the Block Cache, MemStore, and HFiles in the following steps:
- First, the Scanner looks for the Row Cells in theBlock Cache (Read Cache).
- Next, the Scanner looks in the MemStore (Write Cache).
- If the Scanner does not find all of the Row Cells in the MemStore and Block Cache, then Hfiles are referred.
Hope the above answers your query. If Yes, Kindly mark the Post as Resolved.
Regards, Smarak
Created on ‎12-10-2022 10:04 AM - edited ‎12-10-2022 10:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smdas So even if a key is updated in Memcache and not updated in Blockcache the read merge updates the values of BlockCache with Memcache directly without updating HFile? Cause for HFile to be updated a flush has to happen right? Or is it that MemCache and BlockCache checks are done simultaneously?
Created on ‎12-13-2022 07:39 AM - edited ‎12-13-2022 07:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @pacman HFile is only updated after flush of Memstore. In the scenario shared, Read Merge would share the Updated values without updating the Hfile.
We can perform the same by following for verification:
1. Create a Table with 1 CF. Insert 1 Row & flush Table to ensure 1 Hfile is created.
2. Read the Table, which would read from Hfile & place in BlockCache. The BlockCache Size is visible via HBase UI at the CF family level within the Table's Level Statistics,
3. Read the Table again, which would read from BlockCache. Verify via Hit Ratio from the BlockCache Stats in HBase UI.
4. Update the Row by using the same RowKey, yet using a different Value for the Column Qualifier within the Column Family.
5. Read the Table again. You should get the Updated Value from Step 4.
6. To make things interesting, Remove the concerned Table's 1 Region hosting RegionServer WAL file & Kill the RegionServer PID. This ensure the MemStore isn't flushed owing to Ungraceful exit & WAL can't be replayed.
7. Start the RegionServer, which shall create the WAL file. Read the Table again. The same would show Value from Step 1.
This would help confirm the HFile isn't Updated when HBase Read Merge the Values from BlockCache & MemStore while reading the Table as per Step 5.
Regards, Smarak
Created ‎12-13-2022 09:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@smdas Adding another scenario to picture. lets say a row key is already in blockcache. An update for that was just made. The row key has an existing value already in blockcache which is not the latest but the updated value is in memstore and if flushed then in Hfile. When a read occurs for the same row key then we look for the data first in blockcache and it will find a row key with old value. How does hbase know the value blockcache currently holds is not the latest and latest has to be fetched from Memstore or Hfile?
Created ‎12-19-2022 12:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @pacman A Merge happens for each Read Operation i.e. BlockCache & MemStore. As such, Incorrect Values aren't observed. Having said that, If you observe any such scenario of Read/Write Inconsistency, Kindly share a Use-Case & any replication attempt to allow us to review accordingly.
Regards, Smarak
