Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DistributedMapCacheClientService (NiFi Wecrawler.xml template )

avatar

In the NiFi WebCrawler template located here:

https://github.com/hortonworks-gallery/nifi-templates/tree/master/templates

There is a "remove duplicates" processor that uses a DistributedMapCacheClientService. I tried to google/bing that, but I couldn't come up with exactly what that is. Is it something I have to install/configure/enable/? If someone could point me to information on Distributed Cache Service, what it is used for and how to use it, I would greatly appreciate it (as you can probably guess, I'm pretty new to Hadoop).

1 ACCEPTED SOLUTION

avatar
Master Guru

The DistributedMapCache is a NiFi concept which is used to store information for later retrieval, either by the current processor by another processor. There are two components - the DistributedMapCacheServer which runs on one node if you are in a cluster, and the DistributedMapCacheClientService which runs on all nodes if in a cluster, and communicates with the server. Both of these are Controller Services, configured in NiFi through the controller section in the top right toolbar. Processors use the client service to store and retrieve data from the cache server. In this case, DetectDuplicate uses the cache to store information about what it has seen and determine if it is a duplicate.

View solution in original post

3 REPLIES 3

avatar

Hi @Francis Apel

I believe the information you are looking for is here:

- https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.client.Distribut...

You have links to three pages talking about distributed cache.

In short it gives you the ability to have a map key/value to store information along your flow. The service is what you reference in some processors to link the processors to this map.

Hope this helps.

avatar
Rising Star

any thoughts on how to clear this DMC cache.. Suppose I have 4 entries in DEPT_LKP table.. DEPT_NO 10, 20, 30, 40 get loaded to DMC.. in Future if i delete DEPT_NO 20 entry from source table.. DMC wont delete it from the cache.. worse part is.. it will use the cached value of DEPT_NO 20..

avatar
Master Guru

The DistributedMapCache is a NiFi concept which is used to store information for later retrieval, either by the current processor by another processor. There are two components - the DistributedMapCacheServer which runs on one node if you are in a cluster, and the DistributedMapCacheClientService which runs on all nodes if in a cluster, and communicates with the server. Both of these are Controller Services, configured in NiFi through the controller section in the top right toolbar. Processors use the client service to store and retrieve data from the cache server. In this case, DetectDuplicate uses the cache to store information about what it has seen and determine if it is a duplicate.