Support Questions

Find answers, ask questions, and share your expertise

DetectDuplicate is not working as expected

avatar

Hi all,

I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?

Imagine this simple Json list:

[ {
  "ID": 101
}, {
  "ID": 102
}, {
  "ID": 103
}, {
  "ID": 104
}, {
  "ID": 105
}, {
  "ID": 106
}, {
  "ID": 107
}, {
  "ID": 108
}, {
  "ID": 109
}, {
  "ID": 110
} ]

Looking at the above Json list we expect that every item will be a non duplicate item in Distributed Map Cache Server. But it is not what is happening.

Here is the Detect Duplicate Propertie configuration:

20473-11-detect-duplicate-properties.png

When I start the process flow look what happens:

20474-12-result.png

Only the first ID is detected as a non duplicate as you can see in the LogAttribute - Non Duplicate data provenance:

20475-13-data-provenance-log-attribute-non-duplicate.png

What Am I doing wrong? Am I missing setting some configuration?

Here is the template: detect-duplicate.xml

Any help will be much appreciated!

Thank you in advance.

1 ACCEPTED SOLUTION

avatar

Hi @Gabriel Queiroz,

If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).

View solution in original post

4 REPLIES 4

avatar

Hi @Gabriel Queiroz,

If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).

avatar

Hi @kkawamura,

you are saving me again!

In this question https://community.hortonworks.com/questions/110551/how-to-remove-a-cache-entry-identifier-from-distr... you sent an example https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772 and your example is here in my NiFi, and a looked at it several times... but I don't pay attention to this:

20487-kkawamura-remove-cache-example.png

My fault! I'm ashamed!

Thank you very much again @kkawamura!

avatar
New Contributor

Hi All,

I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?

Imagine this simple Json list:

{
"data": {
"alertaID": "xxxxx",
"app": "BSS",
"node": "Weblogic",
"severity": "critical",
"type": "com.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercent",
"hashField1": "BSS_Pcriticalcom.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercentWeblogic",
"hashField2": "criticalWeblogic",
"hashField3": "criticalBSS_P",
}
}

 

Looking at the above Json list we expect that every item will be a non duplicate item in RedisDistributed Map Cache Client based on cache entryidenitifier. But it is not what is happening.

Here is the Detect Duplicate Propertie configuration:


CacheEntryIdentifier: 

$.data.app::$.data.severity::$.data.type::$.data.node

AgeOffDuration: 5mins


I am expecting the input with same value for data.app, data.severity,data.type ,data.node should be considered as duplicate until AgeOffDuration.and remaining input with diff value for any of those filed shoul be considerd as non duplicate

 

 

 

 

avatar
Community Manager

@PoonamB 

 

As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.