Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I need to consolidate Flowfiles onto one node for a defragmented merge, how can I guarantee delivery?

Highlighted

I need to consolidate Flowfiles onto one node for a defragmented merge, how can I guarantee delivery?

New Contributor

Hey all!

I've been sitting on a tough Nifi problem for a little while now and I need some architecture suggestions.

I am receiving CSV files of varying sizes, and splitting them to do some processing on individual records. I need to send out a notification when every record from the file has been processed, and also merge the altered records back together for the next step. The defragment strategy of the merge processors has been extremely helpful for this, but I need to distribute the work across my cluster, and merge record needs all of the files on the same node.

I have considered using listen and postHttp processors in the flow to send the flowfiles to a single node, but see a couple of problems coming up with that.

1. I use listenHttp on the primary node only, and the primary switches nodes when not all of the flowfiles have gathered for the merge.

2. I use listenHttp on a specific node in my cluster, and it goes down.

Does anyone see a different way to go about solving this problem? Is there some HA functionality I am forgetting on my current ideas?

Thanks for any help!

Don't have an account?
Coming from Hortonworks? Activate your account here