Created on 10-29-2019 06:49 AM - last edited on 10-29-2019 07:22 AM by cjervis
We need to build a Kafka production cluster with 3-5 nodes in cluster ,
We have the following options:
Since we are talking on production cluster we need good performance as we have high read/write to disks (disk size is 10T), good IO performance, etc.
So does Kafka using Docker meet the requirements for productions clusters?
more info - https://www.infoq.com/articles/apache-kafka-best-practices-to-optimize-your-deployment
Created 10-30-2019 01:32 PM
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance
Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited.
It is important not to allow a running container to consume too much of the host machine’s memory
As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc
As I reiterated that a good move get you hands dirty 🙂
Created 10-31-2019 01:13 PM
Confluent and Kafka are inseparable 🙂 HDF also has good tooling around Kafka but what you decide on usually depends on the skillsets at hand. Containerized apps are now the norm with reasons as shared before but nevertheless, HDF 3.1 is package with SAM, Nifi, Ambari, Registry and Ranger quite a complete offering.
But the Dockerized version you have too many moving parts and synchronizing Kafka; zookeeper and registry could be a challenge without the good skillsets but the positive side goes to upgrades and deployment and portability OS agnostic.
The choice is yours 🙂
Created 11-03-2019 06:47 AM
Yes, it's possible to deploy HDF using Ambari blueprints. If you compared an HDP and HDF blueprint you will notice a difference in the components section only.
Deploy HDF 1 using a blueprint
Deploy HDF 2 using a blueprint
Deploy HDF 3 using a blueprint
Above are some links that show the possibility
Created on 10-29-2019 11:46 AM - edited 10-29-2019 11:49 AM
Your plans are doable and that's the way many companies have deployed their Kafka production clusters if you intend ONLY to use Kafka, but you could take it a step further by enabling HA and reliability but orchestrating all that with Kubernetes with PVC's it's a great idea.
Running kafka as a microservices on Kubernetes has become the norm and the path of least resistance. It is very difficult to allocate physical machines with local disks for Kafka companies running on VMs have found deploying Kafka outside of Kubernetes causes significant organizational headache.
Running Kafka on Kubernetes gets your environment allocated faster and you can use your time to do productive work rather than fire fighting. Kafka management becomes much easier on kubernetest becomes easier to scaleup adding new brokers is a single command or a single line in a configuration file. And it is easier to perform configuration changes, upgrades and restarts on all brokers and all clusters.
Kafka is a stateful service, and this does make the Kubernetes configuration more complex than it is for stateless microservices. The biggest challenge is configuring storage and network, and you’ll want to make sure both subsystems deliver consistent low latency that where PVC's [Persistent Volume claims] come in use of shared storage.
The beauty is Kafka will run like a POD and you can configure a fixed number that MUST be running at any time and scale when needed with a single Kubectl or HELM command is elasticity at play !!
Kafka also poses a challenge most stateful services don’t Brokers are not interchangeable, and clients will need to communicate directly with the broker that contains the lead replica of each partition they produce to or consume from. You can’t place all brokers behind a single load balancer address you must devise a way to route messages to a specific broker here is a good reading Recommendations for Deploying Apache Kafka on Kubernetes paper
Happy hadooping
Created 10-30-2019 12:27 PM
just want to say first thank you for all explain
but for now we cant work with Kubernetes ( because some internal reasons )
so the option is to work with docker
based on that - do you think kafka cluster using docker will have less performance then kafka cluster without docker ?
Created 10-30-2019 01:32 PM
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments poses some challenges including container management, scheduling, network configuration and security, and performance
Containerized applications have no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows and also each container’s access to the host machine’s CPU cycles is unlimited.
It is important not to allow a running container to consume too much of the host machine’s memory
As you are aware kafka will need Zookeepers so you got to architect well your Kafka deployment but once you master then its a piece of cake and it brings a lot of advantages like upgrades, scaling out, etc
As I reiterated that a good move get you hands dirty 🙂
Created on 10-30-2019 02:22 PM - edited 10-30-2019 02:24 PM
just copy what you said "some challenges including container management, scheduling, network configuration and security, and performance"
so I am understand that you think containers can give negative aspects about performance
the question is if this is very minor affect or maybe major affect on performance
as I mentions we have two choices
install kafka cluster from confluent with zoo and schema registry
OR
install kafka using docker with zoo and schema registry from confluent
third choice is:
install kafka cluster from HDF Kit ( with kafka + zoo + schema registry )
please give your professional opinion
what is the best kafka cluster from these three options? ( when focusing on performance side / production env)
Created 10-31-2019 01:13 PM
Confluent and Kafka are inseparable 🙂 HDF also has good tooling around Kafka but what you decide on usually depends on the skillsets at hand. Containerized apps are now the norm with reasons as shared before but nevertheless, HDF 3.1 is package with SAM, Nifi, Ambari, Registry and Ranger quite a complete offering.
But the Dockerized version you have too many moving parts and synchronizing Kafka; zookeeper and registry could be a challenge without the good skillsets but the positive side goes to upgrades and deployment and portability OS agnostic.
The choice is yours 🙂
Created on 11-02-2019 04:57 PM - edited 11-02-2019 04:57 PM
you mentioned the HDF kit
until now we works with HDP and ambari
dose HDF is the same concept as HDP ? ( include the blueprint in case we want to automate the installation process ? )
Created 11-03-2019 06:47 AM
Yes, it's possible to deploy HDF using Ambari blueprints. If you compared an HDP and HDF blueprint you will notice a difference in the components section only.
Deploy HDF 1 using a blueprint
Deploy HDF 2 using a blueprint
Deploy HDF 3 using a blueprint
Above are some links that show the possibility