Created on 05-09-2016 07:40 PM - edited 09-16-2022 03:18 AM
Given the best practice of separating master and slave node configuration, for the sake of argument, if you have a 2/3 master/slave node configuration, is it recommended to have 3 zookeeper masters and have the other zookeeper installed on a slave node or simply install one zookeeper one of the master nodes? Appreciate the input.
Created on 05-10-2016 11:11 AM - edited 08-19-2019 01:29 AM
There are two reasons for zookeeper numbers:
a) Redundancy
As Predrag mentions if you do not have HA anyway 1 Zookeeper will be as fast and good enough. However you NEED to backup the zookeeper data directory like you would the namenode folder. There is a lot of information in there that you need for a working cluster.
Going for 3 makes life safer I think
b) Performance
Not what you ask for because you mentioned a small cluster however just for general information:
Adding zookeeper nodes makes your cluster slower if you have more than 15-30% write operations. But if you have mostly reading clients adding nodes makes your zookeeper cluster faster. Just in case you ever have performance problems because of too many zookeeper clients. ( Highly unlikely on a smaller cluster unless you are a heavy HBase or Kafka user ( assuming an older kafka version )
http://muratbuffalo.blogspot.co.uk/2014/09/paper-summary-zookeeper-wait-free.html
Created 05-09-2016 07:58 PM
Please see this doc https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
I would deploy 2 in master and 1 in slave as you have only 2 master nodes.
I highly recommend to increase the node count and have minimum 4 or 5 master servers for to distribute the master components and keeping the option open for HA.
Created 05-10-2016 01:28 AM
If you have only one master, no HA of any components, and a few slaves, I'd use only one ZK on the master. If you have 2 masters and let's say 5-6 slaves, you can configure NN and RM HA, and install 3 ZKs, two on two masters and one on one of the slaves. So, you can decide based on HA in cluster: if no HA then only 1 ZK, if you have HA, like NN HA then 3 ZKs.
Created on 05-10-2016 11:11 AM - edited 08-19-2019 01:29 AM
There are two reasons for zookeeper numbers:
a) Redundancy
As Predrag mentions if you do not have HA anyway 1 Zookeeper will be as fast and good enough. However you NEED to backup the zookeeper data directory like you would the namenode folder. There is a lot of information in there that you need for a working cluster.
Going for 3 makes life safer I think
b) Performance
Not what you ask for because you mentioned a small cluster however just for general information:
Adding zookeeper nodes makes your cluster slower if you have more than 15-30% write operations. But if you have mostly reading clients adding nodes makes your zookeeper cluster faster. Just in case you ever have performance problems because of too many zookeeper clients. ( Highly unlikely on a smaller cluster unless you are a heavy HBase or Kafka user ( assuming an older kafka version )
http://muratbuffalo.blogspot.co.uk/2014/09/paper-summary-zookeeper-wait-free.html
Created 05-10-2016 01:13 PM
Thanks, very insightful. Interesting to note how zookeeper starts slowing down on writes as zookeeper nodes are added. Makes sense, I didn't realize zookeeper has to write to disk before acknowledgements are sent back to master.
Created 05-10-2016 01:01 PM
Zookeeper is used for co-ordination and is critical component of Hadoop Ecosystem.
Number of zookeeper should always be odd and to decide on using 1ZK , 3ZK or 5ZK depends on over-all expectation from cluster. 3ZK server over 1ZK server gives more reliability as it can stand with some ZK server failures. So for small cluster if you want HA of ZK then you can use 3ZK servers else 1ZK.
You can put ZK daemon on any machine (master or slave) but i would prefer to put ZK first on master machines (as generally master machine are higher-end/reliable than slave and do not always run compute intensive jobs) han on slave machines.
So for eg, if you have 5-node cluster out of which 2 are master (M1,M2) and 3 are slaves(S1,S2,S3) then putting ZK on M1,M2 and S1 should be fine.
Created 05-10-2016 01:42 PM
Hi Ed,
It would be useful to know if you are aiming for HA or performance. Since it is a small cluster you may use it as a POC and not care much about HA, I don't know. One option not mentioned below is going with 3 masters and 3 slaves in a small HA cluster setup. That allows you to balance services on the masters more and/or dedicate one to be mostly an edge node. If security is a topic that may come in handy.
Cheers,
Christian