Support Questions
Find answers, ask questions, and share your expertise

What are best practices to configure NiFi clusters for high availability across two data centers?

New Contributor

I am looking for options and best practices for how to configure NiFi clusters (with external zookeeper servers) for high availability and failover when an entire data center goes offline. I am planning on putting the clusters in two separate data centers and need to continue processing if one data center totally fails (like loses all connectivity or power).

I plan to start with 3 zookeeper servers and 3 NiFi servers in one data center and then equal numbers in a second data center. I feel I need separate zookeeper server setup because I am also going to run Kafka and embedded zookeeper from NiFi had many fatal errors when I tried to reuse it with Kafka.

The two data centers have a low latency, high bandwidth network connection so there is not much overhead to communicate between them. All NiFi servers in both data centers query a single database server which only exists in one data center, then transform the data and write it to a cache (there is a separate cache in each data center). Because there is only one DB server, if that DB is down we switch to read only mode and use the surviving cache to know what is in the DB) So we need the caches in both data centers hot with up to date info.

One option is to configure all the NiFi servers in both data centers as a single logical cluster and keep both caches updated in a fully hot-hot configuration. The data would be queried once and written simultaneously into both separate caches in the two centers.

The other obvious option is to have a two separate NiFi clusters grouped by data center and have each cluster read from the DB separately (which doubles the db-intensive SQL queries) and only write to the cache located in their own data center. What are the pros and cons of these two approaches and does anyone have a better option than these two?

FYI: I read this article which gave me the two ideas above, but it is written for NiFi 0.X and goes into details on NCM which are no longer used in NiFi 1.3.0. The ideal would be if this url were updated to take into account how NiFi 1.3.0 works (hint, hint @chakra).