Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MergeContent - Two files per merge, thousands of files

MergeContent - Two files per merge, thousands of files

New Contributor

I'm looking at nifi to fulfill a requirement where we have a directory of possibly hundreds of thousands of json and binary files. Each json files needs to be paired with its binary partner and zipped. I've been looking at mergecontent to achieve this.

As I understand it, nifi will create a certain number of bins in memory and once the prereqs for number of files are met the merge occurs.

However are there issues with this when you have a large number of files? Say flow files arrive at mergecontent at different times and one is sitting in a bin waiting while it's partner is being queued. Is there a strategy to deal with this? I've considered using the json file to feed a GetFile processor and then sending them both off at the same time, but I don't know how to sync this and keep the original flowfile when using GetFile.

Don't have an account?
Coming from Hortonworks? Activate your account here