The requirement is like, in Nfi, first read the data from MongoDB, then convert all the JSON flowfiles into CSV flowfiles using InferAvroSchema Processor and then convert all those CSV flowfiles into one CSV file.
For me, the files are created as few bundles of CSV files. (ie. if there are 100 flow files, the output files are bundled like 50, 35, 8.. etc)
please guide me on this.
What NiFi processor are you using to merge your CSV files together?
How do you have that NiFi processor configured?
Is your NiFi a multi-node cluster or a single standalone NiFi instance?
Thanks for your reply.
I am using MergeRecord processor. PFA for the properties settings.
As of now, I am running the Nifi in a single standalone instance.
Could possibilities come to mind here:
1. Based on your screenshot provided, you have configured a Max Bin Age of 2 minutes. The timer on a Bin starts when the first FlowFile has been allocated to that bin. It is possible that your bin is being merged at 2 minutes when it still has fewer than the desired 100 FlowFiles allocated to it yet. Try setting Max Bin Age to a higher value and see what the results are.
2. You have 10 allocated bins, but have possibly 10 or more unique schemas being used by your JsonTreeReader. Since only FlowFiles with like schemas will be allocated to the same bin. If when allocating a new FlowFile to a bin and that new FlowFile does match any existing bin, it will be allocated to a new bin. if no free bins exist, the oldest bin will get merged to free a bin. Verify all 100 source FlowFiles use exact same schema.