Hi everyone,
We have a NiFi cluster with 3 nodes that was functioning fine until we encountered the following error. The cluster uses an embedded ZooKeeper for coordination. The error logs indicate issues with connection loss and leadership. Here are the relevant log entries:
2024-06-19 16:25:05,335 WARN [Process Cluster Protocol Request-25] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from nifi01 due to org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling protocol message in response to message type: HEARTBEAT due to java.net.SocketException: Relais brisé (pipe) (Write failed)
org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling protocol message in response to message type: HEARTBEAT due to java.net.SocketException: Relais brisé (pipe) (Write failed)
at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:186)
at org.apache.nifi.io.socket.SocketListener$2$1.run(SocketListener.java:131)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
The cluster was operating normally before this issue arose. Now, it appears to be having trouble with leadership roles and communication between nodes.
Questions:
- What could be causing this connection this problem?
- How can we troubleshoot and resolve this problem to restore normal cluster operations?