Support Questions

Jagatheeshr · ‎09-30-2015

Customer's Flume Kafka Sink,

we have defined agent.sinks.kafka-netflow-ci.brokerList=edge01:9092,edge02:9092,edge03:9092

Question is How does sink uses the broker list ?

Are edge02 and edge03 used in case edge01 is failing or will they be used randomly ?

from the internet i could below , But still not clear on this.

The brokers in the Kafka sink uses to discover topic partitions, formatted as a comma-separated list of hostname:port entries. You do not need to specify the entire list of brokers, but Cloudera recommends that you specify at least two for high availability.

schintalapani · ‎09-30-2015

broker.list for kafka consumers or producers used for bootstrapping. Consumer or Producer makes a request to get TopicMetadata which tells the clients what are the topic partitions and who are leaders for these partitions so that clients can send requests to the leaders.

To answer your question brokerList will be shuffled and it will go through each one of the hosts and makes TopicMetadataRequest if its succeeded it will return , if not it will continue to next broker.

View solution in original post

schintalapani · ‎09-30-2015

broker.list for kafka consumers or producers used for bootstrapping. Consumer or Producer makes a request to get TopicMetadata which tells the clients what are the topic partitions and who are leaders for these partitions so that clients can send requests to the leaders.

To answer your question brokerList will be shuffled and it will go through each one of the hosts and makes TopicMetadataRequest if its succeeded it will return , if not it will continue to next broker.

Jagatheeshr · ‎10-01-2015

@schintalapani@hortonworks.com . Thanks. Just wanted to clarify the leader of the partition is only valid when we have replication in place ?. If we dont have any replication (replication factor 1) . It just need to look for the topic with partitions list and would connect to the each partition based on the partition key.?

schintalapani · ‎10-01-2015

Even if there is no replication there will be a leader. It still makes a call to broker.list to find out who is the leader for given topic-partition. Partitions are spread across the cluster so the client still needs who is leader of a partition irrespective of replication.

Jagatheeshr · ‎10-01-2015

@schintalapani@hortonworks.com. Considering our case, We have 3 dedicated Kafka Nodes with a topic having 3 partition(no replication) and we are using random key partition here. in this case, Producer will pick the random partition and pipes the data (say default 10 min) and then move on to another partition. As all the writes and reads are happening via Leader Partition, How does this work when there is no replication . Does the leader change every 10 mins?

Above assumptions are based on below lines.

Why Data is not Evenly Distributed in Kafka Parition ?

From the link here

Each partition has one server which acts as the "leader" and zero or more servers which act as "followers". The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.

Correct me if i am wrong .

schintalapani · ‎10-01-2015

The partitions will be distributed among the kafka nodes. If you created a topic with 3 partitions and you've 3 kafka nodes than each node will get a single topic partition and it will be the leader for it. When there is no Key in the messages you are trying to write , Kafka client does a round robin picking of each node and writes to that topic partition and moves on to next one.

If you provide a key it will do hash based partitioning to determine which topic partition to write to.

Jagatheeshr · ‎10-01-2015

@schintalapani@hortonworks.com. Documentation says "The leader handles all read and write requests for the partition while the followers passively replicate the leader." Whereas we are talking about "Kafka client does a round robin picking of each node and writes to that topic partition and moves on to next one.". If only leader partition can handle read and write , how can Kafka client perform round robin here on all the partition.

Aren't these mutually exclusive ?

Jagatheeshr · ‎10-08-2015

@vramachandran@hortonworks.com. Thanks a lot.

Cloudera Community

Support Questions

How does BrokerList in Kafka Sink work ?