Created 09-06-2016 03:31 PM
I'd like to get a poll of pros and cons of Kafka vs. Nifi for multi-datacenter replication in terms of ease-of-use, tooling, tuning, security, etc.
Created 09-06-2016 06:14 PM
Kafka MirrorMaker is designed for the sole purpose of replicating kafka's topi cdata from one data center to another.
Pros:
1. Simple to setup
2. Uses Kafka's produce and consumer api. Makes it easier to enable wire-encryption(SSL) and Keberos (Nifi can offer the same as they both use the same API).
3. Designed to replicate all the topics in source to target data center . Users can also choose and pick specific topic if they desired so.
Cons:
1. Hard to monitor. As the mirror maker is just a JVM process ,provisioning and monitoring the mirror maker process can be hard. One need to monitor the metrics coming from mirrormaker to see if there is any lag or no data being produced into target cluster.
2. MirrorMakers won't keep the origin Kafka topic offsets into target cluster ( Nifi or any other solution will run into the same limitation). As writing a new message into the target data center creates a new offset.
Created 09-06-2016 03:33 PM
Created 09-06-2016 03:49 PM
I'm not sure I understand the versus nature as posed here. MirrorMaker can be used to replicate data from one Kafka broker to another. The NiFi site-to-site protocol can be used to replicate data from one NiFi cluster to another. They both support the appropriate security mechanisms. NiFi offers the fine grained provenance/lineage but arguably Kafka's log replication/offset mechanism is sufficient for the case of replication. As for tuning again both offer strong tuning/throughput mechanisms.
I'd recommend using the facilities of each.
Created 09-06-2016 06:14 PM
Kafka MirrorMaker is designed for the sole purpose of replicating kafka's topi cdata from one data center to another.
Pros:
1. Simple to setup
2. Uses Kafka's produce and consumer api. Makes it easier to enable wire-encryption(SSL) and Keberos (Nifi can offer the same as they both use the same API).
3. Designed to replicate all the topics in source to target data center . Users can also choose and pick specific topic if they desired so.
Cons:
1. Hard to monitor. As the mirror maker is just a JVM process ,provisioning and monitoring the mirror maker process can be hard. One need to monitor the metrics coming from mirrormaker to see if there is any lag or no data being produced into target cluster.
2. MirrorMakers won't keep the origin Kafka topic offsets into target cluster ( Nifi or any other solution will run into the same limitation). As writing a new message into the target data center creates a new offset.