Created 09-07-2017 03:12 PM
Hello,
Is it possible to compare the attributes of two different flowfiles and only pass one if the comparisson results matched?
Thank you,
Jon
Created 09-12-2017 12:45 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 09-12-2017 12:28 PM
Have you checked out NiFi's RouteOnAttribute processor? It can compare the attributes of incoming flowfiles and handle accordingly based on the routing strategy you select.
Created 09-12-2017 12:49 PM
Yes, I've tried to use RouteOnAttribute but the thing is that I want to compare two different flowfiles attributes... and as far as I understand, RouteOnAttribute doesn't allow this kind of comparison... tell me if I'm wrong!
Created 09-12-2017 12:53 PM
Ah, I overlooked the "only pass one" goal in the original question. As @Matt Clarke mentioned, looks like DetectDuplicate might help with that part.
Created 09-12-2017 12:45 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 09-12-2017 12:48 PM
Hi @Matt Clarke,
I just want to compare some attributes from both flowfiles... I'll try with that processor and I'll be back!
Created 09-13-2017 09:38 AM
Hi @Matt Clarke,
That processor made the trick. It's exactly what I was looking for. Thank you so much.
Best,
Jon
Created 09-13-2017 12:48 PM
Glad this worked for you.
As far as your new question:
The value written to the DistributedMapCache remains in the cache for a configured amount of time or until x configured number of entries exist. So you can compare many files against this store value. So any FlowFile that matches a stored value is consider a duplicate. It is not a one time match of a single duplicate.
It would be very expensive to build a NiFi processor that would read in large batches of queued FlowFiles form a inbound queue to do comparisons on FlowFile Attributes (FlowFile attributes live in heap memory space, so the more FlowFile you pull in to do a comparison on, the more likely you are to encounter Out Of Memory). So if you limit the size of the comparisons, how do you know a given batch contains the actual FlowFiles you want to compare?
This is why the detect duplicate makes use of an external service and compares FlowFiles against a stored value one FlowFile at a time.
Thanks,
Matt
Created 09-15-2017 12:03 PM
Hi @Matt Clarke,
Thank you. So, how about cleaning this cache eventually? Is it possible to clean it whenever a duplicate is found? I'm trying with the Eviction Strategy Property but no getting anything so far... I would like to clean the cache whenever a duplicate is found.
Thanks!
Created 09-15-2017 01:42 PM
There are no dedicated processors for removing cached entries from the distributed map cache.
You can try using the "Age Off Duration" property in the detect duplicate processor or use a scripting processor in NiFi to execute a script to clear the cache.
The follwoing Jira covers this missing processor as well as provide a sample template