Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to merge flowfiles in nifi?

avatar
Contributor

39425-seeforsolution.png

I have several flowfile with the same name( in my case it can be date) i want to merge together flowfiles with the same name i tried to use mergecontent and increased minimumGroupSize to 10 kb and even increased maximum number of bins but nothing helps I got this instead of one flowfile with the same name, what should i do?

39424-see.png

7 REPLIES 7

avatar
Master Guru

Change Correlation Attribute Name to "filename" instead of "${filename}".

avatar
Contributor

I have changed it but nothing changed

avatar
Contributor

I have tried it but it didn't help me can you reccomend me any nifi processor whcih can help me or i should make it by groovy code?

avatar
Master Guru

You probably also need to increase Minimum Number of Entries to something greater than 1.

avatar
Super Guru
@sally sally

Please increase minimum number of entries to greater than 1 (I say start with 10). Also increase the minimumgroupsize. In your case, your first file looks like its 72 KB and your minimum groupsize is 10 KB. One file alone satisfies the condition of minimum group size and and combine that with minimum number of entries and merge condition are already satisfied.

avatar
Master Guru

@sally sally, i tried to merge flowfiles with same name and its working as expected.
If you are thinking to merge files no matter of what size then increase the Minimum Group Size to more than 1 then the processor will wait for more than 1 flow file and merges them into 1.

In your case Min group size is 10kb so the first flowfile is having size of 72kb the size is more than group size so it will be same flowfile.
For your next flowfile is 5kb which is less than group size will wait for once it got reaches group size to 10kb.

Once make sure you have used Merged relationship to get output of files as merged.

Example:-
my flow as follows

40418-flow-merge.png

in generateflowfile im using 222 as text and in update attribute i'm updating my filename to 2 if text is 222 and 3 if text is 333.

MergeContent Config:-

40417-merge-config.png

1.in this processor we are having minimum group size as 10B so it will wait 10B as a groupsize based on filename, once group size is 10B it will merges those files and send them as merged file.

2.in my case im having just 3 B as every flowfile so this processor has waited for 4 flowfiles because we mentioned 10B as minimum group size, once it reaches 10B it has given all the merged contents as a Merged relationship
3.We need to connect Merged relationship to another processor(in mycase i connected to updateattribute)
Input:-

These flowfiles having filenames as 2
flowfile1:-
222
flowfile2:-
222
flowfile3:-
222
until this point the size is 9B still it hasn't reached to group size so it will wait for another flow file to get group size
flowfile4:-
222
Output:-
222222222222

avatar
Super Mentor
@sally sally

By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size.

As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge.

Thanks,

Matt