We have 3 nodes of Kafka broker cluster setup along with 3 nodes of the zookeeper. Following are some queries while creating topics in the cluster.
I am running following command to create topic
./kafka-topics.sh --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic portfolio_break_stat
Here are my questions,
1. If I put replication-factor 1 then on which node it will create the topic. It will create on all nodes or only one node. If only one node how Kafka decide it?
2. If I did not mention replication-factor then how the topic is created on the node and which node it will pick up?
First off, ideally, to prevent data loss, you should use more than one replica. For better throughput, use more than one partition.
When you describe the topic, it will tell you the leaders for each partition. That will give the broker ID. You will need to make a note of which ID's belong to which machines as well as the data location for each broker to know where the data is stored on those servers.
As for how it determines, there is a leader election algorithm within Zookeeper... probably worth reading over the Kafka documentation / Wiki if you are really curious about that.
Forcing leaders is also possible, http://blog.erdemagaoglu.com/post/128624804243/forcing-kafka-partition-leaders