Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DetectDuplicate is not working as expected

Solved Go to solution
Highlighted

DetectDuplicate is not working as expected

Hi all,

I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?

Imagine this simple Json list:

[ {
  "ID": 101
}, {
  "ID": 102
}, {
  "ID": 103
}, {
  "ID": 104
}, {
  "ID": 105
}, {
  "ID": 106
}, {
  "ID": 107
}, {
  "ID": 108
}, {
  "ID": 109
}, {
  "ID": 110
} ]

Looking at the above Json list we expect that every item will be a non duplicate item in Distributed Map Cache Server. But it is not what is happening.

Here is the Detect Duplicate Propertie configuration:

20473-11-detect-duplicate-properties.png

When I start the process flow look what happens:

20474-12-result.png

Only the first ID is detected as a non duplicate as you can see in the LogAttribute - Non Duplicate data provenance:

20475-13-data-provenance-log-attribute-non-duplicate.png

What Am I doing wrong? Am I missing setting some configuration?

Here is the template: detect-duplicate.xml

Any help will be much appreciated!

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: DetectDuplicate is not working as expected

Hi @Gabriel Queiroz,

If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).

View solution in original post

4 REPLIES 4
Highlighted

Re: DetectDuplicate is not working as expected

Hi @Gabriel Queiroz,

If you'd like to use ID FlowFile attribute from DetectDuplicate processor's 'Cache Entry Identifier', you need to use NiFi Attribute Expression Language syntax. Currently you have configured it as '$ID', but you need it as '${ID}' (wrap it with a curly bracket).

View solution in original post

Highlighted

Re: DetectDuplicate is not working as expected

Hi @kkawamura,

you are saving me again!

In this question https://community.hortonworks.com/questions/110551/how-to-remove-a-cache-entry-identifier-from-distr... you sent an example https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772 and your example is here in my NiFi, and a looked at it several times... but I don't pay attention to this:

20487-kkawamura-remove-cache-example.png

My fault! I'm ashamed!

Thank you very much again @kkawamura!

Highlighted

Re: DetectDuplicate is not working as expected

New Contributor

Hi All,

I am having problems with DetectDuplicate. It is not working as expected or I am not knowing how to configure it - Am I missing something?

Imagine this simple Json list:

{
"data": {
"alertaID": "xxxxx",
"app": "BSS",
"node": "Weblogic",
"severity": "critical",
"type": "com.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercent",
"hashField1": "BSS_Pcriticalcom.bea/CM49-Server/CM49-Server/JVMRuntime/HeapFreePercentWeblogic",
"hashField2": "criticalWeblogic",
"hashField3": "criticalBSS_P",
}
}

 

Looking at the above Json list we expect that every item will be a non duplicate item in RedisDistributed Map Cache Client based on cache entryidenitifier. But it is not what is happening.

Here is the Detect Duplicate Propertie configuration:


CacheEntryIdentifier: 

$.data.app::$.data.severity::$.data.type::$.data.node

AgeOffDuration: 5mins


I am expecting the input with same value for data.app, data.severity,data.type ,data.node should be considered as duplicate until AgeOffDuration.and remaining input with diff value for any of those filed shoul be considerd as non duplicate

 

 

 

 

Re: DetectDuplicate is not working as expected

Community Manager

@PoonamB 

 

As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question. 


Cy Jervis, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:
Community Guidelines
How to use the forum
Don't have an account?
Coming from Hortonworks? Activate your account here