I've been sitting on a tough Nifi problem for a little while now and I need some architecture suggestions.
I am receiving CSV files of varying sizes, and splitting them to do some processing on individual records. I need to send out a notification when every record from the file has been processed, and also merge the altered records back together for the next step. The defragment strategy of the merge processors has been extremely helpful for this, but I need to distribute the work across my cluster, and merge record needs all of the files on the same node.
I have considered using listen and postHttp processors in the flow to send the flowfiles to a single node, but see a couple of problems coming up with that.
1. I use listenHttp on the primary node only, and the primary switches nodes when not all of the flowfiles have gathered for the merge.
2. I use listenHttp on a specific node in my cluster, and it goes down.
Does anyone see a different way to go about solving this problem? Is there some HA functionality I am forgetting on my current ideas?