I have a CSV file with 100 records. I converted this file into a JSON file and then use 'SplitJson' processor to split these file into 100 flow files. Then in the middle of the data flow, I categorize these flow files as success and fail. So, I introduced a custom attribute as 'status' for each flow file passing through. This attribute will have either 'success' or 'failure'. Then, I use 'RouteOnAttribute' processor upon the above 'status' attribute and routes flow files into two paths. Then, I am trying to use 'MergeRecord' processor on both two paths in order to merge failure records into one file and success records into another file. But my 'MergeRecord' processor didn't merge those set of flow files since the number of flow files on each path is less than the 'fragment.count' value which is 100. Success records count is 77 and failure record counts is 23. Can any one give a solution for this ?
Have you tried using MergeContent instead of MergeRecord? I've noticed a lot of the processors dealing with records can be kind of picky about certain things. I believe merge content would accomplish what you're looking for and it might be easier to use.
Also, do you have some kind of common identifier that you can use to tie the record back together? Maybe you could try posting a screen shot of your merge processor settings.
I tried 'MergeContent' processor also. But, it does not wait for all either 77 records or 23 records to come to the processor. It merge randomly, when looking at the data after passing through the 'MergeContent' processor, some file has 1 records and some files has 2,3 records. It is random.
Also, I have a common attribute which is the 'filename' attribute. My expectation is, many flow files with different CSC files should finally be merged into corresponding files.
The following is the MergeRecord processor settings.
So I can't see what specifically is wrong here, but I can tell you I've done the same thing using MergeContent. I use the bin packing algorithm with binary concatenation.
Just another thought, you might be able to change the run schedule of the merge processor to only run once a minute or something. That might insure that all the successful flowfiles will have queued up behind the processor before it executes.