Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Should zookeeper be run on independent machines? in production environment

avatar



we have HDP version 2.6.4 in our cluster

cluster include 285 data node machines and 3 kafka machines

for now zookeeper servers are installed on the masters machines and zookeepers are not independent machines

but since this is very important production cluster

we think to separate the zookeeper from the masters machines

and install the zookeeper servers on separate servers to be independent machines

am I right here?

Michael-Bronson
1 ACCEPTED SOLUTION

avatar
Master Mentor

@Michael Bronson

Zookeeper is a light weight process hence it won't consume too much resource.

(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

Here are several recommendations for ZooKeeper configuration with Kafka:

  • Do not run ZooKeeper on a server where Kafka is running.

  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.

  • To monitor the ZooKeeper instance, use JMX metrics.

.

As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?

https://community.hortonworks.com/questions/35287/how-to-decide-how-many-zookeepers-should-i-have.ht...


it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.


View solution in original post

14 REPLIES 14

avatar
Master Mentor

@Michael Bronson

Zookeeper is a light weight process hence it won't consume too much resource.

(EDIT😞 However there is the following recommendation from Hortonworks (For Kafka + Zookeeper) :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

Here are several recommendations for ZooKeeper configuration with Kafka:

  • Do not run ZooKeeper on a server where Kafka is running.

  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.

  • To monitor the ZooKeeper instance, use JMX metrics.

.

As far as high availability is concerned for zookeepers then you can refer to the following HCC thread which talks about more on How to decide, how many zookeepers should I have?

https://community.hortonworks.com/questions/35287/how-to-decide-how-many-zookeepers-should-i-have.ht...


it depends on your requirement and then based on the requirement we can decide the number of ZK hosts as 3 or 5 (..etc). More ZK comes with a cost so please go through the above thread.


avatar

@dear Jay about - Do not run ZooKeeper on a server where Kafka is running. " , can you tell me which document from hortonworks or confluent support this? I mean is it official statement ?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

Do not run ZooKeeper on a server where Kafka is running.

The statement is taken from the following doc:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...

avatar

@Dear jay so from the document we see "When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components. "

this mean - if we install the zookeeper on kafka then zookeeper must serve only kafka machines ? , am I correct ?

second - I guess - it is much more better to install the zoo servers on machines that not have other services - for example to install the zoo on clean machine with only redhat OS - am I right here ?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson

The meaning of "you should dedicate ZooKeeper to Kafka," and "Do not run ZooKeeper on a server where Kafka is running." statements are combined.

Zookeeper Servers should not be installed/run on Kafka Broker host.

Kafka should have zookeepers dedicated to it means the Zookeepers which are going to be by Kafka should not be used for other services. Like Kafka Zookeepers should not be used by HBase/ NameNode Failovers /AMS ...etc

avatar

@dear jay - about - Make sure you allocate sufficient JVM memory. A good starting point is 4GB. , which parameter in ambari need to find in order to set the value to 4G?

Michael-Bronson

avatar
Master Mentor

@Michael Bronson
Zookeeper memory related settings can be specified inside the zookeeper-env script (via ambari zookeeper-env template)

# grep 'SERVER_JVMFLAGS' /etc/zookeeper/3.1.0.0-78/0/zookeeper-env.sh
export SERVER_JVMFLAGS=-Xmx4096m

.

avatar

@Dear Jay - Just want to clear this

we want to installed 3 zookeepers servers that serve only kafka ( and not other application )

in that case can we install the 3 zookeepers servers on 3 kafka hosts ?


or we need to dedicated a new hosts ( without kafka ) for the new zookeepers servers?


if we cant installed the zookeeper servers ( that are only server the kafka ) on kafka hosts

can you please explain why?


Michael-Bronson

avatar
Master Mentor

@Michael Bronson

As per standard recommendation/ best practice: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_kafka-component-guide/content/kafka-zook...


Then for example:

1). Do not run ZooKeeper on a server where Kafka is running.

If Kafka Brokers are installed on node1, node2, node3 then you should have Zookeepers on other cluster nodes where Kafka is not installed like node4, node5, node6.


2). When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

Means the Zookeepers running on that node node4, node5, node6 should be used only for Kafka. Which means the Zookeeper running on those nodes (node4, node5, node6) should be dedicated to kafka means should not be used for other purpose like HBase/ NameNode Failovers /AMS ...etc