Support Questions

Find answers, ask questions, and share your expertise

How can we configure controller services for "DetectDuplicate" processor in Apache Nifi ?

avatar
Contributor

I want to use "DetectDuplicate" processor to remove duplicate JSON content or duplicate tweets and merge into a single file.

Can someone help me in this .@Jeremy Dyer,@Matt Burgess

Thanks in advance.

1 ACCEPTED SOLUTION

avatar

Hi @Yogesh Sharma,

First you need to extract an attribute of your JSON that is considered as an identifier of your JSON content.

Let's say you have:

{"id":"myId", "name":"foo", ...}

You may want to use a EvaluateJsonPath processor to extract the value of "id" into a FlowFile attribute by adding to the processor a property with name = id, and value = $.id

Then you can route FlowFiles to your DetectDuplicate processor. For this processor, you need to setup the map cache service. For this, you need to go into the controller services panel and create two controller services:

- a DistributedMapCacheServer with the default settings

- a DistributedMapCacheClientService with hostname to localhost so that it uses the DistributedMapCacheServer you created.

Then you start the two services, and in your DetectDuplicate processor, you can reference the DistributedMapCacheClientService you defined.

Hope this helps.

View solution in original post

3 REPLIES 3

avatar

Hi @Yogesh Sharma,

First you need to extract an attribute of your JSON that is considered as an identifier of your JSON content.

Let's say you have:

{"id":"myId", "name":"foo", ...}

You may want to use a EvaluateJsonPath processor to extract the value of "id" into a FlowFile attribute by adding to the processor a property with name = id, and value = $.id

Then you can route FlowFiles to your DetectDuplicate processor. For this processor, you need to setup the map cache service. For this, you need to go into the controller services panel and create two controller services:

- a DistributedMapCacheServer with the default settings

- a DistributedMapCacheClientService with hostname to localhost so that it uses the DistributedMapCacheServer you created.

Then you start the two services, and in your DetectDuplicate processor, you can reference the DistributedMapCacheClientService you defined.

Hope this helps.

avatar
Contributor

Thanks Pierre Villard. My Nifi is installed in cluster so what setting I need to mention in "DistributedMapCacheClientService". And I also read somewhere that we need to mention "nifi.controller.service.configuration.file" in file "nifi.properties".

Can you put some light on this as well?

avatar

Have a look here: https://community.hortonworks.com/articles/9203/how-to-migrate-a-standalone-nifi-into-a-nifi-clust.h...

It is advised to run the DistributedMapCacheServer on the NCM, then, in DistributedMapCacheClientService, instead of localhost, you can use the IP address of your NCM.