Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Looking for feedback on NiFi Cluster setup

avatar
Expert Contributor

Hi All,

We're working on setting up a NiFi Cluster; additional requirements for us are 1) capture Provenance data and 2) load balance the incoming data that's being pushed to NiFi; so, we're envisioning the below architecture, where a separate NiFi instance will receive the data that's being pushed by the data producer, for load balancing and it will also be used to archive Provenance data (from the Cluster) to HDFS;

I would appreciate your feedback on our design, do you see any flaws, would this work, and any suggestions for better ways of achieving this.

Thanks.

13037-nifi-provenance-load-balancing.png

1 ACCEPTED SOLUTION

avatar
Master Guru

What protocol is the data producer using to push data to the first NiFi instance?

View solution in original post

5 REPLIES 5

avatar
Master Guru

What protocol is the data producer using to push data to the first NiFi instance?

avatar
Expert Contributor

It's TCP/IP

and the data is in HL7 format (format used in healthcare industry)

avatar
Master Guru

Ok, I'm going to assume ListenTCP is the entry point then, let me know if that is not true.

My thought is to reverse this a little bit, because right now if your first NiFi instance goes down then your data producer has nowhere to send the data.

Data Producer -> Load Balancer (nginx supports TCP) -> NiFi Cluster with each node having ListenTCP.

Then have this cluster push the provenance data to a standalone NiFi instance that just puts it into HDFS. This way this second NiFi instance is not in the critical path of the real data and is only responsible for the provenance data. Depending how important the provenance data is to you, you could make this a two node cluster to ensure at least a minimum amount of failover.

avatar
Expert Contributor

Yes, we're using ListenTCP;

I agree with your recommendation.

Our Data Producer is not able to send data to multiple IPs, they can send it to just one IP; so, we're exploring an external load balancer appliance option that sits in front of NiFi Cluster, but I am having this Site-to-Site design as a backup option to the external load balancer (in our case some custom coding needs to be done on our Data Producer side to make load balancing work, so the site-to-site is just a backup in case we have trouble making it work)

avatar
Master Guru

Makes sense, I think haproxy (http://www.haproxy.org/) is a free load balancer that supports TCP, then your data producer can just send to the haproxy address.