Support Questions

Find answers, ask questions, and share your expertise

Kafka using Docker for production clusters

avatar

We need to build a Kafka production cluster with 3-5 nodes in cluster ,

We have the following options:

  1. Kafka in Docker containers (Kafka cluster include zookeeper and schema registry on each node)
  2. Kafka cluster not using docker (Kafka cluster include zookeeper and schema registry on each node)

Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.

So does Kafka using Docker meet the requirements for productions clusters?

more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment

Michael-Bronson
3 ACCEPTED SOLUTIONS

avatar
Master Mentor

@mike_bronson7 

 

Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance

Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited.
It is important not to allow a running container to consume too much of the host machine’s memory

As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc

 

As I reiterated that a good move get you hands dirty 🙂

View solution in original post

avatar
Master Mentor

@mike_bronson7 

 

Confluent and Kafka are inseparable 🙂 HDF also has good tooling around Kafka but what you decide on usually depends on the skillsets at hand. Containerized apps are now the norm with reasons as shared before but nevertheless, HDF 3.1 is package with SAM, Nifi, Ambari, Registry and Ranger quite a complete offering.

But the Dockerized version you have too many moving parts and synchronizing Kafka; zookeeper and registry could be a challenge without the good skillsets but the positive side goes to upgrades and deployment and portability OS agnostic.

The choice is yours 🙂

View solution in original post

avatar
Master Mentor

@mike_bronson7 

Yes, it's possible to deploy HDF using Ambari blueprints.  If you compared an HDP and HDF blueprint you will notice a difference in the components section only.

Deploy HDF 1 using a blueprint

Deploy HDF 2 using a blueprint

Deploy HDF 3 using a blueprint

 Above are some links that show the possibility 

 

 

View solution in original post

7 REPLIES 7

avatar
Master Mentor

@mike_bronson7 

Your plans are doable and that's the way many companies have deployed their Kafka production clusters if you intend ONLY to use Kafka, but you could take it a step further by enabling HA and reliability but orchestrating all that with Kubernetes with PVC's it's a great idea. 

Running kafka as a microservices on Kubernetes has become the norm and the path of least resistance. It is very difficult to allocate physical machines with local disks for Kafka companies running on VMs have found deploying Kafka outside of Kubernetes causes significant organizational headache.

Running Kafka on Kubernetes gets your environment allocated faster and you can use your time to do productive work rather than fire fighting. Kafka management becomes much easier on kubernetest becomes easier to scaleup adding new brokers is a single command or a single line in a configuration file. And it is easier to perform configuration changes, upgrades and restarts on all brokers and all clusters.
Kafka is a stateful service, and this does make the Kubernetes configuration more complex than it is for stateless microservices. The biggest challenge is configuring storage and network, and you’ll want to make sure both subsystems deliver consistent low latency that where PVC's [Persistent Volume claims] come in use of shared storage.
The beauty is Kafka will run like a POD and you can configure a fixed number that MUST be running at any time and scale when needed with a single Kubectl or HELM command is elasticity at play !!

Kafka also poses a challenge most stateful services don’t Brokers are not interchangeable, and clients will need to communicate directly with the broker that contains the lead replica of each partition they produce to or consume from. You can’t place all brokers behind a single load balancer address you must devise a way to route messages to a specific broker here is a good reading Recommendations for Deploying Apache Kafka on Kubernetes paper

 

Happy hadooping

 

avatar

 

just want to say first thank you for all explain

 

but for now we cant work with Kubernetes  ( because some internal reasons )

 

so the option is to work with docker

 

based on that - do you think kafka cluster using docker will have less performance then kafka cluster without docker ?

Michael-Bronson

avatar
Master Mentor

@mike_bronson7 

 

Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance

Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited.
It is important not to allow a running container to consume too much of the host machine’s memory

As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc

 

As I reiterated that a good move get you hands dirty 🙂

avatar

 

 

just copy what you said "some challenges including container management, scheduling, network configuration and security, and performance"

 

so I am understand that you think containers can give  negative aspects about performance

 

the question is if this is very minor affect or maybe major affect on performance

 

as I mentions we have two choices 

 

install kafka cluster from confluent with zoo and schema registry 

 

OR

 

install kafka using docker with zoo and schema registry  from confluent

 

third choice is:

 

install kafka cluster from HDF Kit ( with kafka + zoo + schema registry )

 

 

please give your professional opinion 

 

what is the best kafka cluster from these three options?  ( when focusing on performance side / production env)

 

 

Michael-Bronson

avatar
Master Mentor

@mike_bronson7 

 

Confluent and Kafka are inseparable 🙂 HDF also has good tooling around Kafka but what you decide on usually depends on the skillsets at hand. Containerized apps are now the norm with reasons as shared before but nevertheless, HDF 3.1 is package with SAM, Nifi, Ambari, Registry and Ranger quite a complete offering.

But the Dockerized version you have too many moving parts and synchronizing Kafka; zookeeper and registry could be a challenge without the good skillsets but the positive side goes to upgrades and deployment and portability OS agnostic.

The choice is yours 🙂

avatar

you mentioned the HDF kit

 

until now we works with HDP and ambari

dose HDF is the same concept as HDP ? ( include the blueprint in case we want to automate the installation process ? )

Michael-Bronson

avatar
Master Mentor

@mike_bronson7 

Yes, it's possible to deploy HDF using Ambari blueprints.  If you compared an HDP and HDF blueprint you will notice a difference in the components section only.

Deploy HDF 1 using a blueprint

Deploy HDF 2 using a blueprint

Deploy HDF 3 using a blueprint

 Above are some links that show the possibility