Just curious to know, where should install kafka producers & Consumers.
Is it on the same hosts where we installed kafka brokers ?
Or on other nodes of the cluster , where we had installed sparks slaves and clients ?
FYI , we are planning for 30 node cluster . On 5 nodes we planned to install kafka broker and on remaining we install other hadoop componenrts including spark master and spark slaves/clients.
Producers and Consumers do not need to be colocated with the Kafka Brokers. It is very common, actually, for this to not be the case. Many times, the producers will be on systems external to HDP cluster altogether and send messages to the Kafka brokers over the network. Likewise, it is a common architecture to dedicate nodes to the Kafka brokers due to the load profile and disk usage patterns of Kafka, and connect the consumers to the Brokers.
Thank you very much for your response.
Can Kafka Consumers and spark client , reside on the same nodes?
Will there be any exchange of date between kafka consumers and spark clients?
What my understanding was spark jobs reads data directly from kafka brokers ( earlier it was from zookeeper)
Also i came across, that number of producers should be equal to number of consumers