<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: 3 node cluster managed by 3 node zookeeper cluster, primary failing to startup and throwing IllegalClusterStateException in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411257#M253014</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/112872"&gt;@MK77&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;First lets clarify the Zookeeper (ZK) elected roles in Apache NiFi.&lt;/P&gt;&lt;P&gt;Primary:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ZK elects one node in the cluster as the "Primary" node.&amp;nbsp; &amp;nbsp;Processor components on the canvas configured to with Execution=Primary node will only get scheduled on that elected primary node.&amp;nbsp; No other nodes will schedule these processors to execute.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Cluster Coordinator:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ZK elects one of the nodes as the cluster coordinator.&amp;nbsp; Other nodes learn which node is the elected cluster coordinator from ZK.&amp;nbsp; All nodes will send node heartbeats to the cluster coordinator to form the cluster.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Any node in the NiFi cluster can be assigned either or both of these roles.&amp;nbsp; There is no guarantee that the same node(s) will always be assigned these roles.&amp;nbsp; Even after NiFi cluster is formed and roles are assigned, which nodes are assigned these roles can change.&lt;BR /&gt;&lt;BR /&gt;The flow.json.gz contain the dataflows on the canvas that are loaded on startup.&amp;nbsp; The flow.xml.gz is only loaded if the flow.json.gz is missing.&amp;nbsp; &amp;nbsp;If NiFi loads the dataflow from the flow.xml.gz, it will generate a flow.json.gz from that flow.xml.gz.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now on to your problem....&lt;BR /&gt;&lt;BR /&gt;Neither of the log lines you shared point to any problem:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Invalid State Cannot replicate request to Node &amp;lt;node-hostname:port&amp;gt; because the node is not connected&lt;/LI-CODE&gt;&lt;P&gt;This log line simply tells you that this node can't replicate a request to anothetr node yet because it has not has not connected yet to the cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: The Flow Controller is initializing the Data Flow.. Returning Conflict response.&lt;/LI-CODE&gt;&lt;P&gt;This simply tells you that the flow.json.gz is still being initialized (loaded).&amp;nbsp; &amp;nbsp;This process needs to complete before the node finishes startup and can join the cluster.&amp;nbsp; Depending on which Apache NiFi version you are running and the size of yoru dataflow, this can take some time to complete.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What is the complete version of NiFi you are using?&lt;/P&gt;&lt;P&gt;Without your full logs it is not possible from what has been shared to tell you what is going on or even if there really is any corruption with your flow.json.gz.&lt;BR /&gt;&lt;BR /&gt;One thing you can do is configure yoru NiFi to startup with all components on yoru canvas stopped instead of their last known state.&amp;nbsp; &amp;nbsp;This can be helpful if you have added a recent new dataflow that is perhaps causing issues initializing at startup.&lt;BR /&gt;&lt;BR /&gt;This achieved by changing the following setting in the nifi,properties file. Save a backup of your flow.json.gz before starting after changing this setting.&amp;nbsp; The saved flow.json.gz will have the original saves state (Running, Stopped, Disabled) of all the components.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;nifi.flowcontroller.autoResumeState=false&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If your NiFi cluster starts fine after making this change, you can restart your dataflows to see if any are having issues.&lt;BR /&gt;&lt;BR /&gt;Beyond the above suggestion, there is not enough information shared to suggest anything else.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Jul 2025 12:54:35 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2025-07-07T12:54:35Z</dc:date>
    <item>
      <title>3 node cluster managed by 3 node zookeeper cluster, primary failing to startup and throwing IllegalClusterStateException</title>
      <link>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411246#M253010</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;
&lt;P&gt;I have a 3 node Apache Nifi cluster setup, which is managed by a 3 node zookeeper cluster.&lt;/P&gt;
&lt;P&gt;The dev cluster worked fine, with one node frequently dropping off with us having to sometimes manually restarting the node, after renaming its flow.xml.gz and flow.json.gz, after which the node started up fine and connected to the cluster.&lt;/P&gt;
&lt;P&gt;But today, after 1 node went down, it wouldnt connect back to the cluster (even after renaming the flow gzs). Within some minutes another node disconnected from the cluster, and the last node which was the primary at that stage threw a socket time out, so I manually restarted it, and it wont startup throwing&amp;nbsp;&lt;/P&gt;
&lt;DIV class="message-pane-title"&gt;&lt;STRONG&gt;Invalid State Cannot replicate request to Node oooo-nifiat01.yy.xxx.local:0000 because the node is not connected&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;with the nifi-user.log complaining of&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;&lt;EM&gt;&lt;STRONG&gt;o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: The Flow Controller is initializing the Data Flow.. Returning Conflict response.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;It looks like the flow.xml.gz/flow.json.gz is corrupted on primary and we have a whole lot of dev which we cannot afford to lose. Could anyone please help in how we can restore the primary node, and once its online, I can bring up the other 2 nodes.&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;Thanks&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;MK&lt;/DIV&gt;
&lt;DIV class="message-pane-title"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Tue, 08 Jul 2025 05:07:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411246#M253010</guid>
      <dc:creator>MK77</dc:creator>
      <dc:date>2025-07-08T05:07:12Z</dc:date>
    </item>
    <item>
      <title>Re: 3 node cluster managed by 3 node zookeeper cluster, primary failing to startup and throwing IllegalClusterStateException</title>
      <link>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411253#M253013</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/112872"&gt;@MK77&lt;/a&gt;,&amp;nbsp;Welcome to our community! To help you get the best possible answer, I have tagged&amp;nbsp;&lt;SPAN&gt;our NiFi experts,&amp;nbsp;&lt;A target="_blank" rel="noopener"&gt;@MattWho,&lt;/A&gt;&amp;nbsp;&lt;A target="_blank" rel="noopener"&gt;@SAMSAL,&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/20288"&gt;@Shelton&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;, who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 10:59:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411253#M253013</guid>
      <dc:creator>VidyaSargur</dc:creator>
      <dc:date>2025-07-07T10:59:26Z</dc:date>
    </item>
    <item>
      <title>Re: 3 node cluster managed by 3 node zookeeper cluster, primary failing to startup and throwing IllegalClusterStateException</title>
      <link>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411257#M253014</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/112872"&gt;@MK77&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;First lets clarify the Zookeeper (ZK) elected roles in Apache NiFi.&lt;/P&gt;&lt;P&gt;Primary:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ZK elects one node in the cluster as the "Primary" node.&amp;nbsp; &amp;nbsp;Processor components on the canvas configured to with Execution=Primary node will only get scheduled on that elected primary node.&amp;nbsp; No other nodes will schedule these processors to execute.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Cluster Coordinator:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ZK elects one of the nodes as the cluster coordinator.&amp;nbsp; Other nodes learn which node is the elected cluster coordinator from ZK.&amp;nbsp; All nodes will send node heartbeats to the cluster coordinator to form the cluster.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Any node in the NiFi cluster can be assigned either or both of these roles.&amp;nbsp; There is no guarantee that the same node(s) will always be assigned these roles.&amp;nbsp; Even after NiFi cluster is formed and roles are assigned, which nodes are assigned these roles can change.&lt;BR /&gt;&lt;BR /&gt;The flow.json.gz contain the dataflows on the canvas that are loaded on startup.&amp;nbsp; The flow.xml.gz is only loaded if the flow.json.gz is missing.&amp;nbsp; &amp;nbsp;If NiFi loads the dataflow from the flow.xml.gz, it will generate a flow.json.gz from that flow.xml.gz.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now on to your problem....&lt;BR /&gt;&lt;BR /&gt;Neither of the log lines you shared point to any problem:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Invalid State Cannot replicate request to Node &amp;lt;node-hostname:port&amp;gt; because the node is not connected&lt;/LI-CODE&gt;&lt;P&gt;This log line simply tells you that this node can't replicate a request to anothetr node yet because it has not has not connected yet to the cluster.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;o.a.n.w.a.c.IllegalClusterStateExceptionMapper org.apache.nifi.cluster.manager.exception.IllegalClusterStateException: The Flow Controller is initializing the Data Flow.. Returning Conflict response.&lt;/LI-CODE&gt;&lt;P&gt;This simply tells you that the flow.json.gz is still being initialized (loaded).&amp;nbsp; &amp;nbsp;This process needs to complete before the node finishes startup and can join the cluster.&amp;nbsp; Depending on which Apache NiFi version you are running and the size of yoru dataflow, this can take some time to complete.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What is the complete version of NiFi you are using?&lt;/P&gt;&lt;P&gt;Without your full logs it is not possible from what has been shared to tell you what is going on or even if there really is any corruption with your flow.json.gz.&lt;BR /&gt;&lt;BR /&gt;One thing you can do is configure yoru NiFi to startup with all components on yoru canvas stopped instead of their last known state.&amp;nbsp; &amp;nbsp;This can be helpful if you have added a recent new dataflow that is perhaps causing issues initializing at startup.&lt;BR /&gt;&lt;BR /&gt;This achieved by changing the following setting in the nifi,properties file. Save a backup of your flow.json.gz before starting after changing this setting.&amp;nbsp; The saved flow.json.gz will have the original saves state (Running, Stopped, Disabled) of all the components.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;nifi.flowcontroller.autoResumeState=false&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If your NiFi cluster starts fine after making this change, you can restart your dataflows to see if any are having issues.&lt;BR /&gt;&lt;BR /&gt;Beyond the above suggestion, there is not enough information shared to suggest anything else.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Jul 2025 12:54:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/3-node-cluster-managed-by-3-node-zookeeper-cluster-primary/m-p/411257#M253014</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2025-07-07T12:54:35Z</dc:date>
    </item>
  </channel>
</rss>

