Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

The Flow Controller is initializing the Data Flow.

avatar
Contributor

I deployed nifi on a cluster of 10 servers, I have 5 external zukipirs which are successfully used by Kafka on the same 10 servers, after starting the nifi.service process I see the error The Flow Controller is initializing the Data Flow in the web version and it doesn’t go further than this message I in the web version, Data Flow doesn’t go anywhere. and in nifi-app.log 2023-09-27 16:37:11,645 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is located at sd-sagn-rtyev:9082. Will send Cluster Connection Request to this address
2023-09-27 16:37:11,665 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 'CONNECTION_REQUEST' protocol message due to: java.net.SocketException: Broken pipe (Write failed)

help me!

1 REPLY 1

avatar
Master Mentor

@VLban 

From what you have shared, I don't think you are having any issues with yoru NiFi communicating with your zookeeper.  When NiFi is running it sends a heartbeat message to ZK so that ZK knows that node is available.  ZK is used to facilitate the election of two NiFi roles:
1. Cluster coordinator - Only one node in the NiFi cluster can be elected as cluster coordinator.  The cluster coordinator is responsible for replicating requests made form any node to all nodes in the cluster.  This allows for NiFi to support a zero master architecture meaning that users do not need to connect to the elected cluster coordinator node in order to make changes.  Users can interact with the NiFi cluster form any node.
2. Primary node -  Only one node at a time can be elected to this role.  The node with this assigned role will be the only node that schedules component processors configured with "primary node" only execution.

Your log output shared indicates that ZK is receiving these heartbeats from at least some of the 10 nodes (maybe all of them, but we know the node from which you got this log is talking to ZK fine) allowing for cluster coordinator election to be successful.  We see that "sd-sagn-rtyev:9082" was elected with the cluster coordinator role.  Once nodes aware of who the elected cluster coordinator is, they will start sending cluster heartbeats to that elected cluster coordinator.  The initial set of heartbeats will be used to connect the nodes to the cluster (things like making sure all nodes are running the exact sam flow.xml.gz/flow.json.gz, have matching users.xml files, and authorizations.xml files).  

If your NiFi is secured (running over HTTPS), then all communications between nodes are over mutualTLS encrypted connections.   Based on the exception you shared, it sounds like this connection between node(s) and the elected cluster coordinator is failing.
1. Make sure that all nodes can properly resolve the cluster hostnames to reachable IP addresses.
2. Make sure that the PrivateKeyEntry in each nodes keystore configured in the nifi.properties supports EKUs clientAuth and serverAuth, have required host SAN entry(s).
3. Make sure that the truststore used on every node contains the complete trust chain for all the privateKey entries being used by all 10 nodes.  A PrivateKey may be signed by a root or intermediate CA (an intermediate CA may be signed by another intermediate CA or the root CA).  A complete trust chain consists of ALL trusted public certificates from signer of the Private key to the root CA.

If a MutualTLS handshake can not be established, typically one side or the other will simply close the connection. Most commonly as a result of lack of proper trust. Thus would explain the Broken pipe (write failed) as the client was unable to send heartbeat connection_request to the elected cluster coordinator. 

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt