Support Questions

Find answers, ask questions, and share your expertise

Why MergeContent is leaving a file in the queue

avatar
Super Collaborator

Hi,

I have 74 files in a directory , and i am Merging those with max 20 files and min 4 files defined as in my process below.

If you look at the dataflow , MergeContent only processed 73 out of 74 and left 1 file in the queue. when i looked at data province and it showed 4 merged files 3 with merge count 20 and 1 with merge count 13. i was expecting 4 files 3 with merge count 20 and 1 with merge count 14.

any idea why it is doing it and what can we do to correct it.?

6421-mergeprocess.png

6420-mergecontent.png

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Saikrishna Tarapareddy

Where all 74 files in the input queue before the MergeContent was run? The mergeContent processor just like the other processors works on a run schedule. My guess is that last file was not in the queue at the moment the MergeContent processor ran, so you only saw 13 get bundled instead of 14. With a min of 4 entries, it will read what is on the queue and bin it. You likely ended up with 3 bins with 20 and 1 bin with 13 because at the moment it looked at the queue 73 or 13 FlowFiles is all it saw. You can confirm this by stopping the MergeContent and allowing all 74 files to queue before staring it. The behavior should then be as you suspect. Sounds like it is not important to have exactly 20 per merged file. Perhaps you can set a max bin age so that files don't get stuck. Something else you can do is adjust the run schedule so the mergeContent does not run as often. The default is "o sec" which means run as fast as possible. Try changing that to somewhere between 1 and 10 sec to give the files a chance to queue. If you are picking up all the 74 files at the same time, we are likely talking milliseconds here that is causing this last file to get missed.

Thanks,

Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Saikrishna Tarapareddy

Where all 74 files in the input queue before the MergeContent was run? The mergeContent processor just like the other processors works on a run schedule. My guess is that last file was not in the queue at the moment the MergeContent processor ran, so you only saw 13 get bundled instead of 14. With a min of 4 entries, it will read what is on the queue and bin it. You likely ended up with 3 bins with 20 and 1 bin with 13 because at the moment it looked at the queue 73 or 13 FlowFiles is all it saw. You can confirm this by stopping the MergeContent and allowing all 74 files to queue before staring it. The behavior should then be as you suspect. Sounds like it is not important to have exactly 20 per merged file. Perhaps you can set a max bin age so that files don't get stuck. Something else you can do is adjust the run schedule so the mergeContent does not run as often. The default is "o sec" which means run as fast as possible. Try changing that to somewhere between 1 and 10 sec to give the files a chance to queue. If you are picking up all the 74 files at the same time, we are likely talking milliseconds here that is causing this last file to get missed.

Thanks,

Matt

avatar
Super Collaborator

i think changing the run schedule worked. thank you.