Created 06-27-2019 08:00 AM
we have HDP version 2.6.4 in our cluster
cluster include 285 data node machines and 3 kafka machines
for now zookeeper servers are installed on the masters machines and zookeepers are not independent machines
but since this is very important production cluster
we think to separate the zookeeper from the masters machines
and install the zookeeper servers on separate servers to be independent machines
am I right here?
Created 06-28-2019 02:38 AM
Zookeeper is a light weight process hence it won't consume too much resource.
(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :
Here are several recommendations for ZooKeeper configuration with Kafka:
Do not run ZooKeeper on a server where Kafka is running.
When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.
Make sure you allocate sufficient JVM memory. A good starting point is 4GB.
To monitor the ZooKeeper instance, use JMX metrics.
.
As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?
it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.
Created 07-01-2019 06:29 AM
@Dear Jay now its more clear - but why? what is the reason that we shouldn't install zoo on kafka ?
second can we install zoo on data node machine ? ( when zoo is dedicated and serve only to kafka )
Created 07-01-2019 06:44 AM
Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea.
The most performance-critical part of ZooKeeper is the transaction log. ZooKeeper must sync transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely impact performance. The ZooKeeper transaction log must be configured in a dedicated device. This is very important to achieve best performance from ZooKeeper.
ZooKeeper's transaction log must be on a dedicated device. ZooKeeper writes the log sequentially, without seeking. Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays.
ZooKeeper in a situation that can cause a swap. In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap. Remember, in ZooKeeper, everything is ordered, so if one request hits the disk, all other queued requests hit the disk.
Some good discussions can be found on the following HCC threads / articles on this:
1. https://community.hortonworks.com/questions/55868/zookeeper-on-even-master-nodes.html
2. https://community.hortonworks.com/questions/2498/best-practices-for-zookeeper-placement.html
3. https://community.hortonworks.com/articles/62667/zookeeper-sizing-and-placement-draft.html
Created 07-01-2019 08:57 AM
@Dear Jay you said - "Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea. "
I agree
but lets say that:
assume we install the zookeeper on kafka machine but zookeeper will write the logs to the disk of the OS
and not the disk of the kafka
and we disable the swap on the kafka machine
is this change the picture ?
I mean in that case zookeeper not use the disk of kafka and use the disk of the OS of the kafka + disable swap
in that case you agree ?
Created 07-01-2019 09:10 AM
The Doc which i shared talks about Standard Best Practice. The doc does not say that running ZK on Kafka host will not work.
But as a best practice you should keep them on separate hosts due to load constraints.
However, it is subject to your Pre Prod Environment Testing and Metrics Analysis on both the scenarios and then you can proceed with what suite your requirement.
Created 06-28-2019 07:14 AM
@Michael Bronson originally posted the above in the Community Help track. On Fri Jun 28 07:12 UTC 2019, a member of the HCC moderation staff moved it and the entire reply thread below to the Design & Architecture track. The Community Help Track is intended for questions about using the HCC site itself, not technical questions about zookeeper servers or Kafka.