Support Questions

Find answers, ask questions, and share your expertise

NiFi MergeContent generating 2 output files. 1 file expected

avatar
New Contributor

Hello,

 

I 'm working with NIFI 1.17 (and I'm new in NIFI 😉 )

 

I have the next flow :  ...> Merge Content > Update Attributes > Put file.

 

- #10000 lines are on MergeContent input

- Update Attributes is used to change filename (without this processor I've a warning message that a file with the same name already exist).

 

PutFile is generating 2 files instead one (one file with #7000 lines the other with #3000 lines)

 

How to set up these processor to have only one files on output?

 

Thank you 

 

1 REPLY 1

avatar
Master Mentor

@quimic 
Welcome to NiFi.
The PutFile processor does not produce new content, so I am not clear on "generating 2 files instead one (one file with #7000 lines the other with #3000 lines)".  

What this sounds like is you have two different NiFi FlowFiles (one with 7000 lines of content and another with 3000 lines of content) being passed to your PutFile processor.  The PutFile processor is then just writing those to files to the configured directory path.

I am guessing maybe that you wanted your MergeContent processor to produce one FlowFile with 10000 lines?  If so that requires a change to the configuration within your MergeContent processor.  Are you always merging 10,000 FlowFiles to 1 FlowFile?

Please share some more details and the configuration of your MergeContent processor.

With NiFi you need to understand that each processor operates independent of the processor before or after it.  So the processor before your MergeContent is going to be moving FlowFiles in its outbound connection feeding the MergeContent at the same time MergeContent is executing.  This means that not all 10,000 Source FlowFiles may be in that connection when MergeContent starts allocating those FlowFiles to a bin.  If a bin meets the min merge criteria configured at completion of bin execution, it gets merged and then MergeContent executes again and gets the new FlowFiles added to that connection since last execution.  The following properties control when FlowFiles allocated to bin get merged:

MattWho_0-1663188877176.png


You may also find these articles helpful when working with MergeContent processors:
https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-u...
https://community.cloudera.com/t5/Community-Articles/How-to-address-JVM-OutOfMemory-errors-in-NiFi/t...

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

 

Thank you,

Matt