Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Zookeeper issues with leader shutdown in a 3-node cluster

avatar
New Contributor

Zookeeper 3-node cluster in docker (DockerHub zookeeper:3.6.1). Individual services deployed as stack.

 

docker-compose.yml:

zookeeper1:
    image: zookeeper:3.6.1
    hostname: "zookeeper1"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
      ALLOW_ANONYMOUS_LOGIN: "yes"
      ZOO_4LW_COMMANDS_WHITELIST: "*"
      ZOO_STANDALONE_ENABLED: "false"
      ZOO_INIT_LIMIT: 20
      ZOO_SYNC_LIMIT: 10
    networks:
      - kafka-net
    volumes:
      - zookeeper-data:/data/
      - zookeeper-datalog:/datalog/

  zookeeper2:
    image: zookeeper:3.6.1
    hostname: "zookeeper2"
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
      ALLOW_ANONYMOUS_LOGIN: "yes"
      ZOO_4LW_COMMANDS_WHITELIST: "*"
      ZOO_STANDALONE_ENABLED: "false"
      ZOO_INIT_LIMIT: 20
      ZOO_SYNC_LIMIT: 10
    networks:
      - kafka-net
    volumes:
      - zookeeper-data:/data/
      - zookeeper-datalog:/datalog/

  zookeeper3:
    image: zookeeper:3.6.1
    hostname: "zookeeper3"
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181
      ALLOW_ANONYMOUS_LOGIN: "yes"
      ZOO_4LW_COMMANDS_WHITELIST: "*"
      ZOO_STANDALONE_ENABLED: "false"
      ZOO_INIT_LIMIT: 20
      ZOO_SYNC_LIMIT: 10
    networks:
      - kafka-net
    volumes:
      - zookeeper-data:/data/
      - zookeeper-datalog:/datalog/

 

Everything works fine with the cluster when the 3 nodes are up and running. Also if one of the follower nodes go down, 2 nodes cluster keeps working right and clients (zkCli, Kafka, Nifi...) can connect without problems.

 

Issue:

With the 3 nodes up (fresh start). As soon as I shutdown the leader, the left-overs nodes votes to elect a new leader. I believe the process works as expected. I can request (4LW) stat from nodes without problem.

However, If I try to connect from zkCli to any of the nodes (either from inside or outside the container localhost:2181, zookeeperX:2181) I get stuck in "(CONNECTING)". Same problem from other clients, as Nifi, that can't connect to zookeeper so it goes to unavailable state.

 

zkCli.sh output with 3 nodes up and running:

root@zookeeper1:/apache-zookeeper-3.6.1-bin/bin# zkCli.sh
Connecting to localhost:2181
2021-02-23 16:41:51,793 [myid:] - INFO  [main:Environment@98] - Client environment:zookeeper.version=3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 04/21/2020 15:01 GMT
2021-02-23 16:41:51,795 [myid:] - INFO  [main:Environment@98] - Client environment:host.name=zookeeper1
2021-02-23 16:41:51,796 [myid:] - INFO  [main:Environment@98] - Client environment:java.version=11.0.8
2021-02-23 16:41:51,797 [myid:] - INFO  [main:Environment@98] - Client environment:java.vendor=N/A
2021-02-23 16:41:51,797 [myid:] - INFO  [main:Environment@98] - Client environment:java.home=/usr/local/openjdk-11
2021-02-23 16:41:51,797 [myid:] - INFO  [main:Environment@98] - Client environment:java.class.path=/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.6.1-bin/bin/../build/classes:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-prometheus-metrics-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-jute-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/snappy-java-1.1.7.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_servlet-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_hotspot-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_common-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-epoll-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-resolver-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-handler-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-codec-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-buffer-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/metrics-core-3.2.5.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-util-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-server-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-security-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-io-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-http-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-databind-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-core-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-annotations-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-lang-2.6.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:java.io.tmpdir=/tmp
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:java.compiler=<NA>
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:os.name=Linux
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:os.arch=amd64
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:os.version=3.10.0-1062.12.1.el7.x86_64
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:user.name=root
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:user.home=/root
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:user.dir=/apache-zookeeper-3.6.1-bin/bin
2021-02-23 16:41:51,798 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.free=248MB
2021-02-23 16:41:51,799 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.max=256MB
2021-02-23 16:41:51,800 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.total=256MB
2021-02-23 16:41:51,803 [myid:] - INFO  [main:ZooKeeper@1005] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@6166e06f
2021-02-23 16:41:51,807 [myid:] - INFO  [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-02-23 16:41:51,815 [myid:] - INFO  [main:ClientCnxnSocket@239] - jute.maxbuffer value is 1048575 Bytes
2021-02-23 16:41:51,827 [myid:] - INFO  [main:ClientCnxn@1703] - zookeeper.request.timeout value is 0. feature enabled=false
Welcome to ZooKeeper!
2021-02-23 16:41:51,841 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1154] - Opening socket connection to server localhost/127.0.0.1:2181.
2021-02-23 16:41:51,842 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1156] - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2021-02-23 16:41:51,849 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@986] - Socket connection established, initiating session, client: /127.0.0.1:33314, server: localhost/127.0.0.1:2181
JLine support is enabled
2021-02-23 16:41:51,887 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1420] - Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x1016469ec7c0000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]

 

zkCli.sh output with 2 nodes after leader shutdown:

(server as argument in this test, I can paste output to localhost if necessary)

root@zookeeper1:/apache-zookeeper-3.6.1-bin/bin# zkCli.sh -server zookeeper1:2181
Connecting to zookeeper1:2181
2021-02-23 17:05:21,893 [myid:] - INFO  [main:Environment@98] - Client environment:zookeeper.version=3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b, built on 04/21/2020 15:01 GMT
2021-02-23 17:05:21,896 [myid:] - INFO  [main:Environment@98] - Client environment:host.name=zookeeper1
2021-02-23 17:05:21,896 [myid:] - INFO  [main:Environment@98] - Client environment:java.version=11.0.8
2021-02-23 17:05:21,897 [myid:] - INFO  [main:Environment@98] - Client environment:java.vendor=N/A
2021-02-23 17:05:21,897 [myid:] - INFO  [main:Environment@98] - Client environment:java.home=/usr/local/openjdk-11
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:java.class.path=/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.6.1-bin/bin/../build/classes:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-prometheus-metrics-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-jute-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/zookeeper-3.6.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/snappy-java-1.1.7.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-log4j12-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/slf4j-api-1.7.25.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_servlet-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_hotspot-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient_common-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/simpleclient-0.6.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-native-epoll-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-transport-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-resolver-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-handler-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-common-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-codec-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/netty-buffer-4.1.48.Final.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/metrics-core-3.2.5.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/json-simple-1.1.1.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jline-2.11.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-util-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-server-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-security-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-io-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jetty-http-9.4.24.v20191120.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-databind-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-core-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/jackson-annotations-2.10.3.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-lang-2.6.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/commons-cli-1.2.jar:/apache-zookeeper-3.6.1-bin/bin/../lib/audience-annotations-0.5.0.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.6.1-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:java.io.tmpdir=/tmp
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:java.compiler=<NA>
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:os.name=Linux
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:os.arch=amd64
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:os.version=3.10.0-1062.12.1.el7.x86_64
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:user.name=root
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:user.home=/root
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:user.dir=/apache-zookeeper-3.6.1-bin/bin
2021-02-23 17:05:21,898 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.free=248MB
2021-02-23 17:05:21,900 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.max=256MB
2021-02-23 17:05:21,900 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.total=256MB
2021-02-23 17:05:21,903 [myid:] - INFO  [main:ZooKeeper@1005] - Initiating client connection, connectString=zookeeper1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@6166e06f
2021-02-23 17:05:21,907 [myid:] - INFO  [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-02-23 17:05:21,915 [myid:] - INFO  [main:ClientCnxnSocket@239] - jute.maxbuffer value is 1048575 Bytes
2021-02-23 17:05:21,921 [myid:] - INFO  [main:ClientCnxn@1703] - zookeeper.request.timeout value is 0. feature enabled=false
Welcome to ZooKeeper!
2021-02-23 17:05:21,934 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1154] - Opening socket connection to server zookeeper1/10.0.19.98:2181.
2021-02-23 17:05:21,935 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1156] - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2021-02-23 17:05:21,946 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@986] - Socket connection established, initiating session, client: /10.0.19.98:44740, server: zookeeper1/10.0.19.98:2181
JLine support is enabled
[zk: zookeeper1:2181(CONNECTING) 0]

2021-02-23 17:05:51,974 [myid:zookeeper1:2181] - WARN  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1229] - Client session timed out, have not heard from server in 30028ms for session id 0x0
2021-02-23 17:05:51,979 [myid:zookeeper1:2181] - WARN  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1272] - Session 0x0 for sever zookeeper1/10.0.19.98:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 30028ms for session id 0x0
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230)
2021-02-23 17:05:53,430 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1154] - Opening socket connection to server zookeeper1/10.0.19.98:2181.
2021-02-23 17:05:53,430 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@1156] - SASL config status: Will not attempt to authenticate using SASL (unknown error)
2021-02-23 17:05:53,431 [myid:zookeeper1:2181] - INFO  [main-SendThread(zookeeper1:2181):ClientCnxn$SendThread@986] - Socket connection established, initiating session, client: /10.0.19.98:44762, server: zookeeper1/10.0.19.98:2181

 

I'm stuck with this problem and high availability Zookeeper seems to be a dream.

 

Any idea or help will be apreciated 🙂

 

Regards,

Diego

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Solved after reading this Jira case 

 

The problem was the 0.0.0.0 in the zookeeper string cluster list

 

 

ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181

 

 

 Change it to the node FQDN solved the problem

 

 

ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181

 

 

Regards

View solution in original post

1 REPLY 1

avatar
New Contributor

Solved after reading this Jira case 

 

The problem was the 0.0.0.0 in the zookeeper string cluster list

 

 

ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181

 

 

 Change it to the node FQDN solved the problem

 

 

ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181

 

 

Regards