Support Questions

Find answers, ask questions, and share your expertise

Should zookeeper be run on independent machines? in production environment

avatar



we have HDP version 2.6.4 in our cluster

cluster include 285 data node machines and 3 kafka machines

for now zookeeper servers are installed on the masters machines and zookeepers are not independent machines

but since this is very important production cluster

we think to separate the zookeeper from the masters machines

and install the zookeeper servers on separate servers to be independent machines

am I right here?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Zookeeper is a light weight process hence it won't consume too much resource.

(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

Here are several recommendations for ZooKeeper configuration with Kafka:

  • Do not run ZooKeeper on a server where Kafka is running.

  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.

  • To monitor the ZooKeeper instance, use JMX metrics.

.

As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?

https://community.hortonworks.com/questions/35287/how-to-decide-how-many-zookeepers-should-i-have.ht...


it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.


View solution in original post

14 REPLIES 14

avatar

@Dear Jay now its more clear - but why? what is the reason that we shouldn't install zoo on kafka ?



second can we install zoo on data node machine ? ( when zoo is dedicated and serve only to kafka )

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea.


The most performance-critical part of ZooKeeper is the transaction log. ZooKeeper must sync transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely impact performance. The ZooKeeper transaction log must be configured in a dedicated device. This is very important to achieve best performance from ZooKeeper.


ZooKeeper's transaction log must be on a dedicated device. ZooKeeper writes the log sequentially, without seeking. Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays.


ZooKeeper in a situation that can cause a swap. In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap. Remember, in ZooKeeper, everything is ordered, so if one request hits the disk, all other queued requests hit the disk.



Some good discussions can be found on the following HCC threads / articles on this:

1. https://community.hortonworks.com/questions/55868/zookeeper-on-even-master-nodes.html

2. https://community.hortonworks.com/questions/2498/best-practices-for-zookeeper-placement.html

3. https://community.hortonworks.com/articles/62667/zookeeper-sizing-and-placement-draft.html


avatar

@Dear Jay you said - "Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea. "


I agree


but lets say that:


assume we install the zookeeper on kafka machine but zookeeper will write the logs to the disk of the OS

and not the disk of the kafka

and we disable the swap on the kafka machine


is this change the picture ?


I mean in that case zookeeper not use the disk of kafka and use the disk of the OS of the kafka + disable swap

in that case you agree ?


Michael-Bronson

avatar
Master Mentor

@Michael Bronson

The Doc which i shared talks about Standard Best Practice. The doc does not say that running ZK on Kafka host will not work.

But as a best practice you should keep them on separate hosts due to load constraints.

However, it is subject to your Pre Prod Environment Testing and Metrics Analysis on both the scenarios and then you can proceed with what suite your requirement.

avatar

@Michael Bronson originally posted the above in the Community Help track. On Fri Jun 28 07:12 UTC 2019, a member of the HCC moderation staff moved it and the entire reply thread below to the Design & Architecture track. The Community Help Track is intended for questions about using the HCC site itself, not technical questions about zookeeper servers or Kafka.

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.