Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi Merge content file - file stuck in queue


Nifi Merge content file - file stuck in queue

New Contributor

Hello @Matt Clarke @Shu & Everyone!

I am trying to remove duplicates from CSV files. For that, I am using below processors.

Extract text - To remove header from 2 flow files ( CSV file)

Split Text - To split the text from flow files

RouteOnContent - Route only the text, not the header

HashContent - To Calculate hash value for all the flow files ( which got split)

DetectDuplicate - To detect duplicate records and redirect non duplicate record to merge content.

MergeContent - To merge the flow files again, based on correlation attribute name.

In this case, sometimes am facing an issue, where a flow file is getting stuck between Detect duplicate & merge content. I have given max bin age as 5 mins, however even after 20 to 30 mins, it is not processing into merge content. Could someone share your inputs. (NOTE: This is standalone NiFi setup, not the cluster) Flow file expiration set to 5 hours.

I have attached screen shots of Merge content configuration.

One more thing, in production environment, we may get millions of record in two csv file. Is there any better way to find duplicate from CSV files, instead of splitting into text. If there is no other way, what configurations, I have to take care to avoid issues in production?





Don't have an account?
Coming from Hortonworks? Activate your account here