Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[Centralized Cache Management in HDFS] cache block report vs full cache state report

[Centralized Cache Management in HDFS] cache block report vs full cache state report

New Contributor

Hi Folks,

 

I am new to hadoop and I am experimenting with the Centralized Cache Management in HDFS.

One thing I would like to understand more is the cache block report that at each heartbeat dn send to nn and full cache state report that it sends to nn whose frequency controlled by dfs.cachereport.intervalMsec.

 

Per this doc: https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.htm...

 

I am wondering if I increase dfs.cachereport.intervalMsec. thuse decrease the frequency that dn sends the full report to nn, what would be the impact as dn sends cache block report at every heartbeat?

 

Thanks a lot

 

2 REPLIES 2

Re: [Centralized Cache Management in HDFS] cache block report vs full cache state report

Master Guru
The cache reports form the basis of awareness of cached block location information at the NameNode. It is basically a list of block IDs that are currently cached by the DataNode.

Delaying this will impact the availability of cached block locations in the information NameNode serves to its clients, when the state changes due to cache modification (add/remove/timers/etc.).

Since the changes to block cache are mostly asynchronously done, this should not impact any specific commands, but it can result in delayed or missed benefits to clients seeking cached locations of recently cached/uncached blocks depending on how far you delay the reports (default's every 10 seconds).

The regular DataNode heartbeats only send cache capacity statistics, not the actual block ID information.

The cache report should be a small list typically - an encoded array of block ID integers and shouldn't impact the NameNode in any significant way unless you have very large caches. Are you spotting an observance that is otherwise?

Re: [Centralized Cache Management in HDFS] cache block report vs full cache state report

New Contributor
Thanks for your explanation Harsh