Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎03-08-2019

[Centralized Cache Management in HDFS] cache block report vs full cache state report

Hi Folks,

 

I am new to hadoop and I am experimenting with the Centralized Cache Management in HDFS.

One thing I would like to understand more is the cache block report that at each heartbeat dn send to nn and full cache state report that it sends to nn whose frequency controlled by dfs.cachereport.intervalMsec.

 

Per this doc: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.htm...

 

I am wondering if I increase dfs.cachereport.intervalMsec. thuse decrease the frequency that dn sends the full report to nn, what would be the impact as dn sends cache block report at every heartbeat?

 

Thanks a lot

 

Posts: 1,836
Kudos: 416
Solutions: 295
Registered: ‎07-31-2013

Re: [Centralized Cache Management in HDFS] cache block report vs full cache state report

The cache reports form the basis of awareness of cached block location information at the NameNode. It is basically a list of block IDs that are currently cached by the DataNode.

Delaying this will impact the availability of cached block locations in the information NameNode serves to its clients, when the state changes due to cache modification (add/remove/timers/etc.).

Since the changes to block cache are mostly asynchronously done, this should not impact any specific commands, but it can result in delayed or missed benefits to clients seeking cached locations of recently cached/uncached blocks depending on how far you delay the reports (default's every 10 seconds).

The regular DataNode heartbeats only send cache capacity statistics, not the actual block ID information.

The cache report should be a small list typically - an encoded array of block ID integers and shouldn't impact the NameNode in any significant way unless you have very large caches. Are you spotting an observance that is otherwise?
New Contributor
Posts: 3
Registered: ‎03-08-2019

Re: [Centralized Cache Management in HDFS] cache block report vs full cache state report

Thanks for your explanation Harsh
Announcements