Support Questions

Find answers, ask questions, and share your expertise

Kafka cluster design

avatar
Explorer

Hi

I am building a data lake with hdp where kafka will be used to ingest all the data.

I have two options. One cluster for everything and kafka is deployed exclusively on some node. One hdp cluster with storage and proceesing and another cluster with only kafka.

What's the best approach ? Pros and cons ?

How to size my kafka part ?

1 ACCEPTED SOLUTION

avatar
Master Guru

Since you plan dedicated Kafka nodes in your "cluster for everything" then Kafka performance will be the same in comparison to a stand-alone Kafka cluster. However, it's good to have a dedicated Zookeeper quorum for Kafka, and in the first option Ambari currently doesn't support 2 ZK quorums per cluster, so you will need to install your ZK for Kafka manually. That's not so complicated, but if you go for a stand-alone Kafka solution, you can use Ambari to install and manage your ZK. So, my recommendation is to go for a stand-alone Kafka cluster.

View solution in original post

3 REPLIES 3

avatar
Master Guru

Since you plan dedicated Kafka nodes in your "cluster for everything" then Kafka performance will be the same in comparison to a stand-alone Kafka cluster. However, it's good to have a dedicated Zookeeper quorum for Kafka, and in the first option Ambari currently doesn't support 2 ZK quorums per cluster, so you will need to install your ZK for Kafka manually. That's not so complicated, but if you go for a stand-alone Kafka solution, you can use Ambari to install and manage your ZK. So, my recommendation is to go for a stand-alone Kafka cluster.

avatar
Rising Star

@Predrag Minovic, can you explain why Kafka needs its own Zk quorum? Why can't it utilize an existing Zk quorum? We are migrating to Kafka in production and I would like to get your take on this.

avatar
Rising Star

@David Lays Please let me know what final Kafka design approach you went with; Kafka on Cluster node or separate Kafka cluster. We are also facing exactly same design dilemma with regards to Kafka installation for Cluster.

Thanks very much in advance.