Support Questions

Find answers, ask questions, and share your expertise

Read merge in hbase

New Contributor

I am trying to understand how read and write happens in HBase and how HBase does the caching. From various articles and videos, I found that a read merge happens when a read request is made to HBase. What I understood is:

  1. Whenever read request made, block cache is first checked for the data.
  2. Then memstore is checked. If data found in both block cache or memstore, data is sent to client.
  3. Else fetched from hfiles.

My doubts are

  1. whether both block cache and memstore is always checked for the data? or whether memstore will be ignored if found in block cache.
  2. If memstore not checked (since found in blockcache),how will client get latest value if there was an edit in memstore?
  3. I created a new table. I added one row. I issued get command to fetch the data. I obtained the data but I didn't see any change in cache hits and reads of block cache. Why?

I know there are multiple questions but all these are linked to read merge and HBase caching. I need a clarity on these concepts and I could not find any in documentation.


Super Collaborator

Hello @sachin_saju 


Thanks for using Cloudera Community. Your queries concerning the Read Path is discussed between a fellow Community User & myself in [1]. Kindly review the same & let us know if the same answer the queries around Read Path. In Summary, Read Path relies on a Merge of BlockCache & MemStore prior to returning the Output to the End-User, thereby avoiding any Inconsistent Read. Refer [2] for few Diagram around the same to help explain the Read Merge Path. 


Concerning Doubt # 3, Our Community User asked a similar Q in [3]. I haven't reviewed this Use-Case internally around Hit/Miss Ratio in the UI to answer the same. Henceforth, I shall let our fellow HBase Engineers to answer [3], which may answer your Q3 as well. 


Barring Q3, Let me know if your first 2 queries are addressed by [1] & [2]. 


Regards, Smarak







New Contributor

Thank you for responding to the post.
I had seen article [2].Based on [1], you are saying that for every operation a lookup is done in block cache and memstore.


I had tried to verify the theory and that led to my Question [3] related to cache miss. If for every read request a merge happens (block cache look up done) , why am I not seeing any change in cache hits?

Also, on the consistency point, if scanner finds new data in memstore as compared to block cache, will hbase update the block cache data that corresponds to latest edit in memstore?



New Contributor

Hello @smdas , can you let me know or tag Hbase engineers who could provide more clarity on my doubts especially on the caching? It would be very helpful. Thanks