Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DistributedMapCacheClientService (NiFi Wecrawler.xml template )

Solved Go to solution

DistributedMapCacheClientService (NiFi Wecrawler.xml template )

New Contributor

In the NiFi WebCrawler template located here:

https://github.com/hortonworks-gallery/nifi-templates/tree/master/templates

There is a "remove duplicates" processor that uses a DistributedMapCacheClientService. I tried to google/bing that, but I couldn't come up with exactly what that is. Is it something I have to install/configure/enable/? If someone could point me to information on Distributed Cache Service, what it is used for and how to use it, I would greatly appreciate it (as you can probably guess, I'm pretty new to Hadoop).

1 ACCEPTED SOLUTION

Accepted Solutions

Re: DistributedMapCacheClientService (NiFi Wecrawler.xml template )

The DistributedMapCache is a NiFi concept which is used to store information for later retrieval, either by the current processor by another processor. There are two components - the DistributedMapCacheServer which runs on one node if you are in a cluster, and the DistributedMapCacheClientService which runs on all nodes if in a cluster, and communicates with the server. Both of these are Controller Services, configured in NiFi through the controller section in the top right toolbar. Processors use the client service to store and retrieve data from the cache server. In this case, DetectDuplicate uses the cache to store information about what it has seen and determine if it is a duplicate.

3 REPLIES 3

Re: DistributedMapCacheClientService (NiFi Wecrawler.xml template )

Hi @Francis Apel

I believe the information you are looking for is here:

- https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.client.Distribut...

You have links to three pages talking about distributed cache.

In short it gives you the ability to have a map key/value to store information along your flow. The service is what you reference in some processors to link the processors to this map.

Hope this helps.

Re: DistributedMapCacheClientService (NiFi Wecrawler.xml template )

any thoughts on how to clear this DMC cache.. Suppose I have 4 entries in DEPT_LKP table.. DEPT_NO 10, 20, 30, 40 get loaded to DMC.. in Future if i delete DEPT_NO 20 entry from source table.. DMC wont delete it from the cache.. worse part is.. it will use the cached value of DEPT_NO 20..

Re: DistributedMapCacheClientService (NiFi Wecrawler.xml template )

The DistributedMapCache is a NiFi concept which is used to store information for later retrieval, either by the current processor by another processor. There are two components - the DistributedMapCacheServer which runs on one node if you are in a cluster, and the DistributedMapCacheClientService which runs on all nodes if in a cluster, and communicates with the server. Both of these are Controller Services, configured in NiFi through the controller section in the top right toolbar. Processors use the client service to store and retrieve data from the cache server. In this case, DetectDuplicate uses the cache to store information about what it has seen and determine if it is a duplicate.

Don't have an account?
Coming from Hortonworks? Activate your account here