Created on 12-15-2017 07:46 AM - edited 09-16-2022 05:38 AM
Hi,
We have a new cluster with CDH 5.11.2. It has only kafka and zookeeper services. Zookeeper ocassionally goes bad with below error causing the kafka brokers to be in green in cloudera manager but are actually bad.
Zookeeper log:
2017-12-14 18:31:00,004 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /xx.xxx.x.xx:33488 which had sessionid 0x25fd2436dd488c5
2017-12-14 18:31:03,691 ERROR org.apache.zookeeper.server.quorum.LearnerHandler: Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:499)
2017-12-14 18:31:03,691 WARN org.apache.zookeeper.server.quorum.LearnerHandler: ******* GOODBYE /xx.xxx.x.xx:58030 ********
2017-12-14 18:31:18,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x360200cd8173815, timeout of 30000ms exceeded
2017-12-14 18:31:18,000 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x360200cd8173815
Kafka Log:
2017-12-14 18:31:11,274 INFO org.apache.curator.framework.state.ConnectionStateManager: State change: SUSPENDED
2017-12-14 18:31:17,275 ERROR org.apache.curator.ConnectionState: Connection timed out for connection string (xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181/kafkadev) and timeout (6000) / elapsed (6002)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:821)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:807)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:63)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-12-14 18:31:18,275 ERROR org.apache.curator.ConnectionState: Connection timed out for connection string (xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181/kafkadev) and timeout (6000) / elapsed (7002)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:821)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:807)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:63)
Created 05-11-2018 01:31 AM
Issue is due to zookeeper timeout to the brokers. The broker znodes are removed in zookeeper but the brokers are available for producing and consuming. To solve the issue, increase the zookeeper client session timeout to a reasonable value. CM doesn't alert about the znodes not available in zookeeper. This alerting seems to be available from CM 5.14.x
Created 05-11-2018 01:31 AM
Issue is due to zookeeper timeout to the brokers. The broker znodes are removed in zookeeper but the brokers are available for producing and consuming. To solve the issue, increase the zookeeper client session timeout to a reasonable value. CM doesn't alert about the znodes not available in zookeeper. This alerting seems to be available from CM 5.14.x