I am new into building data pipelines with Kafka and NiFi and I'm testing to build a Nifi flow using Kafka publisher and consumer, so there's a particular doubt I have when using PublishKafka, topics, consumers and ConsumeKafka.
I have 3 Kafka brokers running on 3 nodes, so I created one Kafka topic on each node with the name "test01". Then, when I configure the PublishKafka processor in Nifi, I set the 3 brokers hostnames and their topic name as follows:
- Kafka Brokers: hdf01.local:6667, hdf02.local:6667, hdf03.local:6667
- Topic Name: test01
This works fine, I can check the consumers by ssh and they show the data from the flowfiles:
./kafka-console-consumer.sh --zookeeper hdf01.local:2181 --topic test01
So when I configure the ConsumeKafka processor in Nifi, I set the properties as:
- Kafka Brokers: hdf01.local:6667, hdf02.local:6667, hdf03.local:6667
- Topic Name: test01
- Group ID: 91802*
* I check the available consumers IDs with the shh line in one of the nodes:
./zookeeper-shell.sh hdf01.local:2181 ls /consumers
And everything works fine, but I still don't understand if it's necessary to create the topics on all the nodes to parallelize, or just creating one would make the same result. Also what's the difference between listing all kafka brokers in the properties or just one?
Thank you all in advance!