Support Questions

Find answers, ask questions, and share your expertise

HDF nifi 1.9 clustering very slow - taking time to join the cluster

avatar
Contributor

NIFI 1.9

HDF 3.4.1

6 node cluster 16 core 64gb memory 

5 zookeeper nodes 2 cores 8gb memory 

 

restarted the new cluster, nodes taking very long to join the cluster. unless I bring node by node up, clustering is not happening.

 

nifi.cluster.node.connection.timeout  120 sec

nifi.cluster.node.max.concurrent.requests 400

nifi.cluster.node.protocol.max.threads 100

nifi.cluster.node.protocol.threads 50

nifi.cluster.node.read.timeout 120s
 
 
nifi.zookeeper.connect.timeout 60s
nifi.zookeeper.session.timeout 60s
 
 
nifi.cluster.load.balance.comms.timeout 60s
nifi.cluster.node.connection.timeout 120s
nifi.cluster.node.read.timeout 120s
 
memory 40GB
 
@MattWho could you advise here why the nodes are not joining the cluster 

 

 

2 REPLIES 2

avatar
Contributor

@MattWho 2020-07-20 17:15:58,357 INFO [Process Cluster Protocol Request-2] o.a.n.c.c.node.NodeClusterCoordinator Received Connection Request from qa-nifi-node-blue-02.abc.com:9091; responding with my DataFlow
2020-07-20 17:15:58,388 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for qa-nifi-node-blue-02.abc.com:9091 -- Received first heartbeat from connecting node. Node connected.
2020-07-20 17:16:07,332 INFO [Process Cluster Protocol Request-2] o.a.n.c.c.node.NodeClusterCoordinator Status of qa-nifi-node-blue-02.abc.com:9091 changed from NodeConnectionStatus[nodeId=qa-nifi-node-blue-02.abc.com:9091, state=CONNECTED, updateId=21] to NodeConnectionStatus[nodeId=qa-nifi-node-blue-02.abc.com:9091, state=CONNECTING, updateId=22]
2020-07-20 17:16:09,000 WARN [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from ip-10-175-123-222.us-west-2.compute.internal due to org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling protocol message in response to message type: CONNECTION_REQUEST due to java.net.SocketException: Broken pipe (Write failed)
org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling protocol message in response to message type: CONNECTION_REQUEST due to java.net.SocketException: Broken pipe (Write failed)
at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:184)
at org.apache.nifi.cluster.protocol.jaxb.JaxbProtocolContext$1.marshal(JaxbProtocolContext.java:86)
at org.apache.nifi.cluster.protocol.impl.SocketProtocolListener.dispatchRequest(SocketProtocolListener.java:182)
2020-07-20 17:16:09,009 INFO [Process Cluster Protocol Request-23] o.a.n.c.p.impl.SocketProtocolListener Finished processing request b20868cb-d4ba-41c4-90ba-07cddda92131 (type=HEARTBEAT, length=3465 bytes) from qa-nifi-node-blue-02.abc.com:9091 in 95 millis
2020-07-20 17:16:41,298 INFO [Process Cluster Protocol Request-24] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 6766e1e4-5181-48aa-9d05-e9c93617afcf (type=CLUSTER_WORKLOAD_REQUEST, length=85 bytes) from ip-10-175-123-222.us-west-2.compute.internal in 133 millis
2020-07-20 17:16:43,489 INFO [Process Cluster Protocol Request-25] o.a.n.c.p.impl.SocketProtocolLis

 

 

stopped all nodes, started 1 node  [CONNECTED, PRIMARY, COORDINATOR], then started node by node, and cluster came up.

 

 

avatar
Contributor

this is multi-az AWS cluster 3 nodes on zone 1 and 3 on zone 2