- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
DetectDuplicate is not working as expected
- Labels:
-
Apache NiFi
Created on ‎07-17-2017 02:59 PM - edited ‎08-18-2019 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?
Imagine this simple Json list:
[ { "ID": 101 }, { "ID": 102 }, { "ID": 103 }, { "ID": 104 }, { "ID": 105 }, { "ID": 106 }, { "ID": 107 }, { "ID": 108 }, { "ID": 109 }, { "ID": 110 } ]
Looking at the above Json list we expect that every item will be a non duplicate item in Distributed Map Cache Server. But it is not what is happening.
Here is the Detect Duplicate Propertie configuration:
When I start the process flow look what happens:
Only the first ID is detected as a non duplicate as you can see in the LogAttribute - Non Duplicate data provenance:
What Am I doing wrong? Am I missing setting some configuration?
Here is the template: detect-duplicate.xml
Any help will be much appreciated!
Thank you in advance.
Created ‎07-18-2017 02:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gabriel Queiroz,
If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).
Created ‎07-18-2017 02:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Gabriel Queiroz,
If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).
Created on ‎07-18-2017 03:27 PM - edited ‎08-18-2019 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kkawamura,
you are saving me again!
In this question https://community.hortonworks.com/questions/110551/how-to-remove-a-cache-entry-identifier-from-distr... you sent an example https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772 and your example is here in my NiFi, and a looked at it several times... but I don't pay attention to this:
My fault! I'm ashamed!
Thank you very much again @kkawamura!
Created ‎04-27-2020 04:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?
Imagine this simple Json list:
{
"data": {
"alertaID": "xxxxx",
"app": "BSS",
"node": "Weblogic",
"severity": "critical",
"type": "com.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercent",
"hashField1": "BSS_Pcriticalcom.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercentWeblogic",
"hashField2": "criticalWeblogic",
"hashField3": "criticalBSS_P",
}
}
Looking at the above Json list we expect that every item will be a non duplicate item in RedisDistributed Map Cache Client based on cache entryidenitifier. But it is not what is happening.
Here is the Detect Duplicate Propertie configuration:
CacheEntryIdentifier:
$.data.app::$.data.severity::$.data.type::$.data.node
AgeOffDuration: 5mins
I am expecting the input with same value for data.app, data.severity,data.type ,data.node should be considered as duplicate until AgeOffDuration.and remaining input with diff value for any of those filed shoul be considerd as non duplicate
Created ‎04-27-2020 06:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question.
Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
