Support Questions

Find answers, ask questions, and share your expertise

Zookeeper issue with java.net.SocketTimeoutException: Read timed out

avatar
Expert Contributor

Hi,

 

We have a new cluster with CDH 5.11.2. It has only kafka and zookeeper services. Zookeeper ocassionally goes bad with below error causing the kafka brokers to be in green in cloudera manager but are actually bad.

 

Zookeeper log:

 

2017-12-14 18:31:00,004 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /xx.xxx.x.xx:33488 which had sessionid 0x25fd2436dd488c5
2017-12-14 18:31:03,691 ERROR org.apache.zookeeper.server.quorum.LearnerHandler: Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:499)
2017-12-14 18:31:03,691 WARN org.apache.zookeeper.server.quorum.LearnerHandler: ******* GOODBYE /xx.xxx.x.xx:58030 ********
2017-12-14 18:31:18,000 INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session 0x360200cd8173815, timeout of 30000ms exceeded
2017-12-14 18:31:18,000 INFO org.apache.zookeeper.server.PrepRequestProcessor: Processed session termination for sessionid: 0x360200cd8173815

 

Kafka Log:

 

2017-12-14 18:31:11,274 INFO org.apache.curator.framework.state.ConnectionStateManager: State change: SUSPENDED
2017-12-14 18:31:17,275 ERROR org.apache.curator.ConnectionState: Connection timed out for connection string (xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181/kafkadev) and timeout (6000) / elapsed (6002)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:821)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:807)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:63)
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-12-14 18:31:18,275 ERROR org.apache.curator.ConnectionState: Connection timed out for connection string (xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181,xxxxxxxxx.devkafka.pre.corp:2181/kafkadev) and timeout (6000) / elapsed (7002)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:195)
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:821)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:807)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:63)

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Issue is due to zookeeper timeout to the brokers. The broker znodes are removed in zookeeper but the brokers are available for producing and consuming. To solve the issue, increase the zookeeper client session timeout to a reasonable value. CM doesn't alert about the znodes not available in zookeeper. This alerting seems to be available from CM 5.14.x

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Issue is due to zookeeper timeout to the brokers. The broker znodes are removed in zookeeper but the brokers are available for producing and consuming. To solve the issue, increase the zookeeper client session timeout to a reasonable value. CM doesn't alert about the znodes not available in zookeeper. This alerting seems to be available from CM 5.14.x