Created on 09-18-2017 08:41 PM - edited 09-16-2022 05:15 AM
We have a cluster with 3 zookeepers running on say servers: s1, s2, and s3. Zookeeper on s3 keeps refusing to connect but not right away after I start it. It will take about 3-4 hours, then it will say the error: Connection failed: [Errno 111] Connection refused to s3.foo.com:2181 in Ambari.
In the zookeeper log, I found this copy pasted below. I have changed the amount of space on the s3 device and there is plenty left. Any ideas?
2017-09-18 19:48:16,333 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1308ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:18,731 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1467ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:20,164 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1431ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:21,524 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1359ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:23,696 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 2170ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:25,416 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1576ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:28,116 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 2699ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:30,369 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 2249ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:31,564 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1195ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:33,884 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1533ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:39,154 - WARN[SyncThread:3:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:3 took 1516ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2017-09-18 19:48:41,043 - ERROR [SyncThread:3:SyncRequestProcessor@183] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:322)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:322)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:491)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:196)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:131)
Created 09-20-2017 06:08 PM
Update: I had 4 flows running at the same time using these Zookeeper instances. When I reduced them from 4 to 2, Zookeeper no longer crashes.