I have a 6 node HDP 2.5 cluster(1 Edge, 1 Master, 1 Secondary Master, 3 Data Nodes) running on Azure VM's. I am looking to install Kafka on on that cluster. I wanted to have Kafka Connect API, Kafka Streams API alongwith that. I went on to "add service" in Ambari and i shows Kafka 0.10 with HDP 2.5.
What would the best way to install kafka in my case on HDP 2.5? How many brokers should i be installing and where to run brokers and zookeeper?
With only 6 nodes in the cluster you will have to find a compromise. It would be ideal to have the kafka broker installed on dedicated machines. With 3 data nodes you are probably just running show cases, so if the load is not very high, you can install the broker on the data nodes, and the zookeeper on the master and secondary master. But the broker needs disk space on the machines, that you can't use for hdfs in that case.
If you have a higher load on your data nodes, but you don't expect high load on kafka and your edge node, you could also install just one broker on the edge node.