Support Questions

Find answers, ask questions, and share your expertise

The ZooKeeper service canary failed for an unknown reason.

avatar
New Contributor

Zookeeper canary failed and Quorum membership failed, I have 3 ZK nodes, I test the connection between the nodes using telnet port 218,4181 are open

amrahmed_0-1696349494128.pngamrahmed_1-1696349944443.png

output of the log file:

2023-09-29 23:21:12,409 INFO org.apache.zookeeper.server.PurgeTxnLog: Removing file: Sep 22, 2023 11:21:12 PM /var/lib/zookeeper/version-2/snapshot.1ef00000006
2023-09-29 23:21:12,409 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task completed.
2023-09-30 23:21:12,401 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task started.
2023-09-30 23:21:12,404 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: zookeeper.snapshot.trust.empty : false
2023-09-30 23:21:12,407 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task completed.
2023-10-01 07:48:51,957 WARN org.apache.zookeeper.server.persistence.FileTxnLog: fsync-ing the write ahead log in SyncThread:1 took 1479ms which will adversely effect operation latency. File size is 134217744 bytes. See the ZooKeeper troubleshooting guide
2023-10-01 07:55:15,621 WARN org.apache.zookeeper.server.persistence.FileTxnLog: fsync-ing the write ahead log in SyncThread:1 took 1119ms which will adversely effect operation latency. File size is 134217744 bytes. See the ZooKeeper troubleshooting guide
2023-10-01 12:17:10,659 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 0x1ef00073933 to /var/lib/zookeeper/version-2/snapshot.1ef00073933
2023-10-01 12:17:10,666 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.1ef00073935
2023-10-01 23:21:12,402 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task started.
2023-10-01 23:21:12,406 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: zookeeper.snapshot.trust.empty : false
2023-10-01 23:21:12,410 INFO org.apache.zookeeper.server.PurgeTxnLog: Removing file: Sep 24, 2023 1:22:34 PM /var/lib/zookeeper/version-2/log.1ef00000007
2023-10-01 23:21:12,457 INFO org.apache.zookeeper.server.PurgeTxnLog: Removing file: Sep 24, 2023 1:22:34 PM /var/lib/zookeeper/version-2/snapshot.1ef00014d4a
2023-10-01 23:21:12,458 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task completed.
2023-10-02 17:12:12,737 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.1ef00084886
2023-10-02 17:12:12,737 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 0x1ef00084885 to /var/lib/zookeeper/version-2/snapshot.1ef00084885
2023-10-02 23:21:12,402 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task started.
2023-10-02 23:21:12,403 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: zookeeper.snapshot.trust.empty : false
2023-10-02 23:21:12,405 INFO org.apache.zookeeper.server.PurgeTxnLog: Removing file: Sep 25, 2023 11:16:19 AM /var/lib/zookeeper/version-2/log.1ef00014d4b
2023-10-02 23:21:12,420 INFO org.apache.zookeeper.server.PurgeTxnLog: Removing file: Sep 25, 2023 11:16:19 AM /var/lib/zookeeper/version-2/snapshot.1ef000220d9
2023-10-02 23:21:12,422 INFO org.apache.zookeeper.server.DatadirCleanupManager: Purge task completed.
2023-10-03 14:10:56,371 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 0x1ef000910ed to /var/lib/zookeeper/version-2/snapshot.1ef000910ed
2023-10-03 14:10:57,837 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.1ef000910ef
2023-10-03 18:29:07,973 WARN org.apache.zookeeper.server.persistence.FileTxnLog: fsync-ing the write ahead log in SyncThread:1 took 2091ms which will adversely effect operation latency. File size is 67108880 bytes. See the ZooKeeper troubleshooting guide
2023-10-03 19:22:07,227 INFO org.apache.zookeeper.server.quorum.QuorumCnxManager: Received connection request /10.23.81.79:60652
2023-10-03 19:22:07,230 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Exception reading or writing challenge: {}
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:533)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:487)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:874)
2023-10-03 19:24:03,603 WARN org.apache.zookeeper.server.NIOServerCnxn: Unable to read additional data from client sessionid 0x100167525b60000, likely client has closed socket

3 REPLIES 3

avatar
Super Collaborator

@amrahmed The Zookeeper snapshot size might have grown bigger and the followers are not able to sync with the leader. You may try to increase the sync and init limit for the zookeeper and check again.

Zookeeper => Configuration ==> Search for 'limit' increase initLimit and syncLimit

- initLimit from 10 to 30

- syncLimit from 5 to 25

Restart Zookeeper

avatar
New Contributor

Thanks for your feedback. I tried this solution but unfortunately didn't solve the problem. Is could be the cloudera-service-agent is not able to get the status of zookeeper correctly.

avatar
Super Collaborator

Can you try to run the below command on all 3 Zookeeper instances?

echo "stat" | nc localhost 2181 | grep Mode