Member since
07-30-2019
3421
Posts
1628
Kudos Received
1010
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 90 | 01-13-2026 11:14 AM | |
| 222 | 01-09-2026 06:58 AM | |
| 523 | 12-17-2025 05:55 AM | |
| 584 | 12-15-2025 01:29 PM | |
| 564 | 12-15-2025 06:50 AM |
01-10-2017
06:57 PM
@Avish Saha The more bins the more of your NiFi JVM heap space that could be used. You just need to keep in mind that if all your bins have lets say 990 KB of data in them and the next file would put any of those queues over 1024 KB, then the oldest bin will still be merged at only 990 KB to make room for a new bin to hold the file that would not fit in any of the existing bins. More bins equals more opportunities for a flowfile to find a bin where it fits... Also keep in mind that as you have it configured, it is also possible for a bin to hang around for an indefinite amount of time. A bin at 999 KB which never gets another qualifying FlowFile that puts its size between 1000 and 1024 will sit forever unless you set the max bin age. This property tells the MergeContent processor to merge a bin no matter what its current state is if it reaches this max age. I recommend you always set this value to the max amount of data latency you are willing accept on this dataflow. If you found all this information helpful, please accept this answer. Matt
... View more
01-10-2017
06:46 PM
@Joshua Adeleke The SplitText processor simply splits the content of an incoming FlowFile into multiple FlowFiles. It gives you the ability to designate how many lines would be considered the header and ignored, but it does no extraction of content in to FlowFile attributes. The ExtractText processor can be used to read parts of the content and assign those parts to different NiFi FlowFile attributes. It will not remove the header form the content, that would still be done during the splitText processor operation. However, every FlowFile created by SplitText will inherit the unique FlowFile attributes from the parent FlowFile. Matt
... View more
01-10-2017
02:22 PM
@Avish Saha
Unless you know that your incoming FlowFiles content can be combined to exactly 1 MB with out going over by even a byte, there is little chance you will see files of exactly 1 MB in size. The mergerContent processor will not truncate the content of a FlowFile to make a 1 MB output FlowFile.
The more common use case to is to set an acceptable merged size range (min 800 KB - max 1 MB) for example. FlowFiles larger then 1 MB will still pass through unchanged.
... View more
01-10-2017
02:07 PM
@Avish Saha
In the case where you are seeing Merged FlowFile larger then 1 MB, i suspect the merge is a single FlowFile that was larger then the 1 MB max. When a FlowFile arrives that exceeds to configured max it is passed to both the original and merged relationships unchanged. decreasing bin number only impacts heap usage but does not change behavior.
... View more
01-10-2017
01:40 PM
1 Kudo
@Joshua Adeleke You could extract the header bits from the first two lines into FlowFile attributes before the SplitText processor. All the FlowFiles that come out of the SplitText processor will all get these new FlowFile attributes as well. You can then use the FlowFile Attributes in your PutSQL.
... View more
01-10-2017
01:31 PM
1 Kudo
@Avish Saha The behavior you should be seeing here is that the mergeContent processor will take the first incoming FlowFile it sees and add it to bin 1. It will then continue to attempt to add additional FlowFiles to Bin 1 until either 1000 total FlowFiles have been added or the min size has reached 1 MB. Now lets say bin 1 has grown to 1000KB (just shy of 1 MB) and the next FlowFile would cause that bin to exceed the max group size of 1 MB. In this case that File would not be allowed to go into bin 1 and would be the first file to start bin 2. Now bin 1 hangs around because the min requirement of 1 MB has not been met and neither max entries or max group size has been met either. So you can see it is possible to fill all 5 of your bins without meeting your very tightly configured thresholds. So what happens when a next FlowFile will not fit in any of the 5 existing bins? The mergeContent processor will merge to oldest bin to free room to start a new bin. So what I am assuming here is you are seeing few or no files that are exactly 1 MB. Thanks, Matt
... View more
01-09-2017
10:10 PM
@Rohit Ravishankar
If you found this information helpful, please accept the answer.
... View more
01-09-2017
10:08 PM
@Rohit Ravishankar Let assume you want to merge the text based Content of both your ABC.txt and CTRL_ABC.txt files into one single NiFi FlowFile. The resulting content of that new merge FlowFile should consist of the content of CTRL_ABC.txt before ABC.txt. A simple combination of ListFile (need 2 of these) --> UpdateAtttribute (need 2 of these) --> FetchFile --> MergeContent will do the trick.
The flow would look something like this: A template is below that can be uploaded to your NiFi which will show how to configure each of these processors so that they are merged in the proper order and merged with the appropriate CTRL files. merge-two-files-using-defragmentation.xml Hope this helps, Matt
... View more
01-09-2017
02:25 PM
1 Kudo
@Rohit Ravishankar
How do you plan to combine the "content" of these two files together? This is the first question that needs to be addressed. You can use the mergeContent processor to merge the "content" of multiple FlowFiles using binary concatenation. If this is acceptable, then all you need to do is consume both your datafile and control file, use updateAttribute to extract the a common name from both filenames into new attribute, and finally use that new attribute as the "correlation Attribute Name" in the MergeContent processor. You would also want to set the min number of entries to 2 in the mergeContent processor. Matt
... View more
01-09-2017
02:15 PM
2 Kudos
@Joshua Adeleke Also check your NiFi app log for any Out Of Memory Errors (OOME). The SplitText processor may be having memory issues trying to split over 40k records. You could try using two splitText processors in series with the first splitting on a 10,000 "Line Split Count" and the second then splitting those 10,000 line FlowFiles with a 1 "Line Split Count". This will greatly reduce the heap memory footprint. In addition, if you do a listing on the queue feeding your putSQL processor, do you see any listed FlowFiles with an unexpected size? Matt
... View more