Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Zookeeper on even master nodes

avatar
Expert Contributor

Hello,

If i have a setup that only has 2 master nodes and say 3 data nodes. What do you think would be the best option:

1. just have 2 zookeepers, one on each of the master nodes

2. Use a data node to get a quorm of 3 zookeepers

3. Any other options?

Thanks,

1 ACCEPTED SOLUTION

avatar
Super Guru

@mliem

I will tackle your questions in order, for your 2 + 3 cluster:

1. If you want HA for your cluster 2 + 3, then 3 zookeepers. If HA is not a requirement, then 1. The number needs to be odd to meet quorum requirements. Even numbers can lead to brain split.

2. Zookeeper Server is considered a MASTER component in Ambari terminology. As such, use two of your master nodes and 1 data node to achieve 3 zookeeper requirement of HA. If non-HA then place it on one of your master nodes.

3. No other options in your case. In case that three master nodes were available, the third zookeeper should be on the third master node.

I simplified my responses using the assumption that you would use just basic services like HDFS and Hive. Even if you were using Kafka and Storm, the responses won't change for your very small cluster. If the cluster was larger, then you could consider allocating separate zookeepers for Kafka and Storm.

If you were using HBase, then the story would be slightly different, even for your small cluster. Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. For HBase, it's better to have the zookeepers on region servers which in your case would be probably what you call data nodes. A cluster than runs many services a 2+3 is a stretch for best practices.

If any of the responses to your question helped, don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

View solution in original post

4 REPLIES 4

avatar

@mliem

It's always recommended to have 3 zookeeper. Are you installing HBase?

When setting up the zookeeper quorum the reasonable numbers are 1, 3 and 5 nodes.

1 is useful if you don't want redundancy at all. This happens, for instance on the sandbox version where you have only a single node in the cluster.

3 is useful for failure tolerance, but it is sensitive to hardware failure during maintenance when you might have one machine down.

5 is used in large, high-value clusters which need to stay up at all costs.

It is very rare to use more than 5 Zookeeper nodes in a cluster.

avatar
Super Guru

in anyway you should go with 3 node zk quorum with 3 nodes ensemble, that means 2 servers must be up at any time. the more members an ensemble has, the more tolerant the ensemble is of host failures.

avatar
Super Guru

@mliem

I will tackle your questions in order, for your 2 + 3 cluster:

1. If you want HA for your cluster 2 + 3, then 3 zookeepers. If HA is not a requirement, then 1. The number needs to be odd to meet quorum requirements. Even numbers can lead to brain split.

2. Zookeeper Server is considered a MASTER component in Ambari terminology. As such, use two of your master nodes and 1 data node to achieve 3 zookeeper requirement of HA. If non-HA then place it on one of your master nodes.

3. No other options in your case. In case that three master nodes were available, the third zookeeper should be on the third master node.

I simplified my responses using the assumption that you would use just basic services like HDFS and Hive. Even if you were using Kafka and Storm, the responses won't change for your very small cluster. If the cluster was larger, then you could consider allocating separate zookeepers for Kafka and Storm.

If you were using HBase, then the story would be slightly different, even for your small cluster. Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. For HBase, it's better to have the zookeepers on region servers which in your case would be probably what you call data nodes. A cluster than runs many services a 2+3 is a stretch for best practices.

If any of the responses to your question helped, don't forget to vote and accept the answer. If you fix the issue on your own, don't forget to post the answer to your own question. A moderator will review it and accept it.

avatar
New Contributor

@Constantin Stanca

Hi, could you please why there could be a split-brain situation when the number of zookeeper nodes is even? Thanks~