Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue with Nifi Merge Content : Files stay in the queue infinitely !

avatar
Contributor

I have a flow where I am using the Merge Content Processor. I noticed lately that some flowfiles stay infinitely in the queue just before the Merge Content. I can't figure out the issue so I am asking for your help !

This is the part of the flow that I am talking about :

13492-1.png

The configuration of the merge content processor is here (merging in the attribute called "cle" and its value is the same for the 2 flowfiles in the queue ! But still they don't merge ) :

13493-2.png

Finally here is the content of the queue :

13494-3.png

Is this due to the first flowfile size (710 MB) ? is there a maximum size for a bin ? If yes why isn't it merged after reaching that size ?

Thank you for your help !

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Mohammed El Moumni

Each Node in a NiFi cluster runs its own copy of the dataflow and works on its own set of FlowFiles.

13628-screen-shot-2017-03-14-at-14329-pm.png

Looking at the screenshot you have above of your queue list, you can see that the two FlowFiles are not on the same node. So each node is running a MergeContent processor and each node is waiting for another FlowFile to complete their bins. You will need to look back earlier in your dataflow to see how your data is being ingested by your nodes to make sure that the matching sets of files end up on the same node for merging.

Thanks,

Matt

View solution in original post

13 REPLIES 13

avatar
Super Guru

@Mohammed El Moumni

A queue has a limit in size (1 GB) or 10,000 files by default.

To change the settings go to setting tab on "Configure" of that queue. See screenshot attached.

If it helps, please vote/accept response.

13505-screen-shot-2017-03-10-at-100448-am.png

It is also possible that downstream you may have another queue or processor stuck due to this limit set by default. You have to increase there and let the processor start processing to reduce the amount in the queue before your queue report may start to drain. Imagine all this flow like a river with all kind of streams and obstructions...

avatar
Contributor

Hi @Constantin Stanca, I changed the back pressure data size to 2GB but the two flowfiles still don't merge ...

13554-4.png

avatar
Super Mentor
@Constantin Stanca

@Mohammed El Moumni

Queue thresholds are per node and will cause a queue to no longer accept additional FlowFiles, It will not prevent downstream processor from processing FlowFiles that are already in that queue.

Had he received two 700MB CSV files on one node, then the 1GB threshold would have been exceeded thus preventing any additional FlowFiles from entering that queue (including the corresponding 70 byte header files). In that case you would be stuck, since merge would not have the files even on a single node needed to merge a bin.

Thanks,

Matt

avatar
Expert Contributor

@Mohammed El Moumni

If you take a look at the details of the flowfiles in the input queue for MergeContent, do you see the correlation attribute present on both flowfiles? Is it possible that, elsewhere in the flow, a flowfile with a correlation ID the same as one of the two flowfiles in the incoming queue was sent to a failure relationship and had been dropped from the flow? In the past, I have done a bit of processing of files from one of the Split* processors, and encountered errors processing one of the fragments. Due to the way I had designed the flow, the fragment with the error was routed to a failure relationship to another processor that terminated the processing of that flowfile, so not all the fragments from the split were sent to MergeContent. This caused all the other fragments to sit in the incoming queue of MergeContent indefinitely.

avatar
Contributor

Hi @Jeff Storck, the correlation attribute is present on both flowfiles and its value is the same. Also, I am sure that for a correlation attribute value, only two flowfiles will have that value. So with my settings : Minimum number of entries = 2, maximum number of entries = 2, I am sure that only those two flowfiles will merge. Still, in my case the two flowfiles in the screenshot stay infinitely in the queue ... I am pretty sure it's a size problem, but can't figure it out.

avatar
Expert Contributor

@Mohammed El Moumni Are other, smaller files merging? I notice in both of your screenshots that the MergeContent processor is stopped, which will prevent files from being merged. Was the processor stopped just to take the screenshots?

avatar
Contributor

@Jeff Storck yes the processor was stopped just to take the screenshots (I left it for running for 1 day and the two files didn't merge). And yes smaller files merge (15MB files merge for example).

avatar
Super Mentor

@Mohammed El Moumni

Each Node in a NiFi cluster runs its own copy of the dataflow and works on its own set of FlowFiles.

13628-screen-shot-2017-03-14-at-14329-pm.png

Looking at the screenshot you have above of your queue list, you can see that the two FlowFiles are not on the same node. So each node is running a MergeContent processor and each node is waiting for another FlowFile to complete their bins. You will need to look back earlier in your dataflow to see how your data is being ingested by your nodes to make sure that the matching sets of files end up on the same node for merging.

Thanks,

Matt

avatar
Expert Contributor

good eyes @Matt Clarke 🙂