I'm looking at nifi to fulfill a requirement where we have a directory of possibly hundreds of thousands of json and binary files. Each json files needs to be paired with its binary partner and zipped. I've been looking at mergecontent to achieve this.
As I understand it, nifi will create a certain number of bins in memory and once the prereqs for number of files are met the merge occurs.
However are there issues with this when you have a large number of files? Say flow files arrive at mergecontent at different times and one is sitting in a bin waiting while it's partner is being queued. Is there a strategy to deal with this? I've considered using the json file to feed a GetFile processor and then sending them both off at the same time, but I don't know how to sync this and keep the original flowfile when using GetFile.