Support Questions

Find answers, ask questions, and share your expertise

Should zookeeper be run on independent machines? in production environment



we have HDP version 2.6.4 in our cluster

cluster include 285 data node machines and 3 kafka machines

for now zookeeper servers are installed on the masters machines and zookeepers are not independent machines

but since this is very important production cluster

we think to separate the zookeeper from the masters machines

and install the zookeeper servers on separate servers to be independent machines

am I right here?

Michael-Bronson
1 ACCEPTED SOLUTION

Super Mentor

@Michael Bronson

Zookeeper is a light weight process hence it won't consume too much resource.

(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

Here are several recommendations for ZooKeeper configuration with Kafka:

  • Do not run ZooKeeper on a server where Kafka is running.

  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.

  • To monitor the ZooKeeper instance, use JMX metrics.

.

As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?

https://community.hortonworks.com/questions/35287/how-to-decide-how-many-zookeepers-should-i-have.ht...


it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.


View solution in original post

14 REPLIES 14

Super Mentor

@Michael Bronson

Zookeeper is a light weight process hence it won't consume too much resource.

(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

Here are several recommendations for ZooKeeper configuration with Kafka:

  • Do not run ZooKeeper on a server where Kafka is running.

  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.

  • To monitor the ZooKeeper instance, use JMX metrics.

.

As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?

https://community.hortonworks.com/questions/35287/how-to-decide-how-many-zookeepers-should-i-have.ht...


it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.


@dear Jay about - Do not run ZooKeeper on a server where Kafka is running. " , can you tell me which document from hortonworks or confluent support this? I mean is it official statement ?

Michael-Bronson

Super Mentor

@Michael Bronson

Do not run ZooKeeper on a server where Kafka is running.

The statement is taken from the following doc:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

@Dear jay so from the document we see "When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components. "

this mean - if we install the zookeeper on kafka then zookeeper must serve only kafka machines ? , am I correct ?

second - I guess - it is much more better to install the zoo servers on machines that not have other services - for example to install the zoo on clean machine with only redhat OS - am I right here ?

Michael-Bronson

Super Mentor

@Michael Bronson

The meaning of "you should dedicate ZooKeeper to Kafka," and "Do not run ZooKeeper on a server where Kafka is running." statements are combined.

Zookeeper Servers should not be installed/run on Kafka Broker host.

Kafka should have zookeepers dedicated to it means the Zookeepers which are going to be by Kafka should not be used for other services. Like Kafka Zookeepers should not be used by HBase/ NameNode Failovers /AMS ...etc

@dear jay - about - Make sure you allocate sufficient JVM memory. A good starting point is 4GB. , which parameter in ambari need to find in order to set the value to 4G?

Michael-Bronson

Super Mentor

@Michael Bronson
Zookeeper memory related settings can be specified inside the zookeeper-env script (via ambari zookeeper-env template)

# grep 'SERVER_JVMFLAGS' /etc/zookeeper/3.1.0.0-78/0/zookeeper-env.sh
export SERVER_JVMFLAGS=-Xmx4096m

.

@Dear Jay - Just want to clear this

we want to installed 3 zookeepers servers that serve only kafka ( and not other application )

in that case can we install the 3 zookeepers servers on 3 kafka hosts ?


or we need to dedicated a new hosts ( without kafka ) for the new zookeepers servers?


if we cant installed the zookeeper servers ( that are only server the kafka ) on kafka hosts

can you please explain why?


Michael-Bronson

Super Mentor

@Michael Bronson

As per standard recommendation/ best practice: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...


Then for example:

1). Do not run ZooKeeper on a server where Kafka is running.

If Kafka Brokers are installed on node1, node2, node3 then you should have Zookeepers on other cluster nodes where Kafka is not installed like node4, node5, node6.


2). When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

Means the Zookeepers running on that node node4, node5, node6 should be used only for Kafka. Which means the Zookeeper running on those nodes (node4, node5, node6) should be dedicated to kafka means should not be used for other purpose like HBase/ NameNode Failovers /AMS ...etc

@Dear Jay now its more clear - but why? what is the reason that we shouldn't install zoo on kafka ?



second can we install zoo on data node machine ? ( when zoo is dedicated and serve only to kafka )

Michael-Bronson

Super Mentor

@Michael Bronson

Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea.


The most performance-critical part of ZooKeeper is the transaction log. ZooKeeper must sync transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely impact performance. The ZooKeeper transaction log must be configured in a dedicated device. This is very important to achieve best performance from ZooKeeper.


ZooKeeper's transaction log must be on a dedicated device. ZooKeeper writes the log sequentially, without seeking. Sharing your log device with other processes can cause seeks and contention, which in turn can cause multi-second delays.


ZooKeeper in a situation that can cause a swap. In order for ZooKeeper to function with any sort of timeliness, it simply cannot be allowed to swap. Remember, in ZooKeeper, everything is ordered, so if one request hits the disk, all other queued requests hit the disk.



Some good discussions can be found on the following HCC threads / articles on this:

1. https://community.hortonworks.com/questions/55868/zookeeper-on-even-master-nodes.html

2. https://community.hortonworks.com/questions/2498/best-practices-for-zookeeper-placement.html

3. https://community.hortonworks.com/articles/62667/zookeeper-sizing-and-placement-draft.html


@Dear Jay you said - "Kafka and Zookeepers are two services that are sensitive to disk I/O. Keeping them on same node will not be a good idea. "


I agree


but lets say that:


assume we install the zookeeper on kafka machine but zookeeper will write the logs to the disk of the OS

and not the disk of the kafka

and we disable the swap on the kafka machine


is this change the picture ?


I mean in that case zookeeper not use the disk of kafka and use the disk of the OS of the kafka + disable swap

in that case you agree ?


Michael-Bronson

Super Mentor

@Michael Bronson

The Doc which i shared talks about Standard Best Practice. The doc does not say that running ZK on Kafka host will not work.

But as a best practice you should keep them on separate hosts due to load constraints.

However, it is subject to your Pre Prod Environment Testing and Metrics Analysis on both the scenarios and then you can proceed with what suite your requirement.

@Michael Bronson originally posted the above in the Community Help track. On Fri Jun 28 07:12 UTC 2019, a member of the HCC moderation staff moved it and the entire reply thread below to the Design & Architecture track. The Community Help Track is intended for questions about using the HCC site itself, not technical questions about zookeeper servers or Kafka.

Bill Brooks, Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.