<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411689#M253125</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/128946"&gt;@Siva227&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The error you shared is when the node is trying to reconnect to the cluster following a disconnection.&amp;nbsp; So first we need to identify why the node disconnected originally.&amp;nbsp; I suspect you are disconnecting due to lack of heartbeat or your node failed to process a change request from the cluster coordinator node.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Cluster Size: 6 nodes&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Each Node: 32 vCPUs, 256 GB RAM&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;JVM Heap Memory: 192 GB (configured per node)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Max Timer Driven Thread Count: 192&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Processor Count: Over 10,000 processors across the flows&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Any specific reason why you configured your NiFi to use so much heap memory.&amp;nbsp; Large heaps like this result in long stop-the-world Garbage Collections (GC).&amp;nbsp; &amp;nbsp;These long garbage collection stop-the -world events can lead to disconnections as a result of lack of heartbeat from that node.&amp;nbsp; A common mistake is setting heap very large simply becuase you have a lot of memory on the node.&amp;nbsp; You want to use the smallest heap possible as needed by yoru dataflows. GC does not kick in until heap usage reaches ~80%.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;The below property controls heartbeat interval and lack of heartbeat disconnection:&lt;BR /&gt;&amp;nbsp;&lt;SPAN&gt;nifi.cluster.protocol.heartbeat.interval=5 sec&lt;BR /&gt;&lt;BR /&gt;The cluster coordinator will disconnect a node due lack of heartbeat if a heartbeat has not been received for 8 times this configured value (40 seconds in this case).&amp;nbsp; &amp;nbsp;It is very possible you encounter GC that last longer then this.&amp;nbsp; I recommend changing your heartbeat interval to 30 sec which will allow up 4 mins of missed heartbeats before the cluster coordinator will disconnect a node.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The following error shared, while not initial cause of node disconnection, is preventing node from reconnecting:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Node disconnected due to Proposed Authorizer is not inheritable by the Flow Controller because NiFi has already started the dataflow and Authorizer has differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty.&lt;/LI-CODE&gt;&lt;P&gt;This implies that there are differences in the authorizations.xml file on this node versus what the cluster has in its authorizations.xml.&amp;nbsp; &amp;nbsp;You also state this is the error seen ver often after a node disconnection?&lt;BR /&gt;&lt;BR /&gt;Are you often modifying or setting up new authorization access policy when you have a node disconnect?&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I'd start with identifying the initial cause of node disconnection which I suspect is either lack of heartbeat or failed to replicate request to node resulting in node being disconnected.&amp;nbsp; Both of which can happen with long GC pauses.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 22 Jul 2025 13:44:31 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2025-07-22T13:44:31Z</dc:date>
    <item>
      <title>Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411550#M253073</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi Cloudera Community,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We are running Apache NiFi version 1.28.1 in a clustered setup with the following specifications:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Cluster Size: 6 nodes&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Each Node: 32 vCPUs, 256 GB RAM&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;JVM Heap Memory: 192 GB (configured per node)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Max Timer Driven Thread Count: 192&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Processor Count: Over 10,000 processors across the flows&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Java version 11&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;We are experiencing the following issues:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Frequent node disconnections&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Flow synchronization failures during node reconnects&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Occasionally, policies appear empty when nodes rejoin&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;We have ensured the flow.xml.gz, authorizations.xml, and users.xml files are consistent across all nodes. However, the issues still persist.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Could you please advise:&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;What could be causing these frequent node disconnects and flow sync failures?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Is there an upper limit on the number of processors or thread count that could lead to instability?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Are there recommended JVM GC or NiFi tuning settings for high-core, high-memory environments?&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Any insights or tuning recommendations would be greatly appreciated.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jul 2025 19:14:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411550#M253073</guid>
      <dc:creator>Siva227</dc:creator>
      <dc:date>2025-07-15T19:14:02Z</dc:date>
    </item>
    <item>
      <title>Re: Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411557#M253078</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/128946"&gt;@Siva227&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our NiFi experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/38301"&gt;@mburgess&lt;/a&gt;&amp;nbsp; who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 02:01:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411557#M253078</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2025-07-16T02:01:51Z</dc:date>
    </item>
    <item>
      <title>Re: Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411565#M253084</link>
      <description>&lt;P&gt;nifi.properties:&lt;BR /&gt;nifi.cluster.protocol.heartbeat.interval=5 sec&lt;/P&gt;&lt;P&gt;nifi.cluster.node.protocol.threads=10&lt;BR /&gt;nifi.cluster.node.protocol.max.threads=50&lt;BR /&gt;nifi.cluster.node.event.history.size=25&lt;BR /&gt;nifi.cluster.node.connection.timeout=2 mins&lt;BR /&gt;nifi.cluster.node.read.timeout=2 mins&lt;BR /&gt;nifi.cluster.node.max.concurrent.requests=150&lt;BR /&gt;nifi.cluster.firewall.file=&lt;BR /&gt;nifi.cluster.flow.election.max.wait.time=1 mins&lt;BR /&gt;nifi.cluster.flow.election.max.candidates=&lt;BR /&gt;nifi.cluster.load.balance.connections.per.node=4&lt;BR /&gt;nifi.cluster.load.balance.max.thread.count=12&lt;BR /&gt;nifi.zookeeper.connect.timeout=30 secs&lt;BR /&gt;nifi.zookeeper.session.timeout=30 secs&lt;/P&gt;&lt;P&gt;zookeeper.properties:&lt;BR /&gt;initLimit=10&lt;BR /&gt;autopurge.purgeInterval=24&lt;BR /&gt;syncLimit=5&lt;BR /&gt;tickTime=2000&lt;BR /&gt;dataDir=./state/zookeeper&lt;BR /&gt;autopurge.snapRetainCount=30&lt;BR /&gt;these are the properties related to nifi and zookeeper&lt;/P&gt;&lt;P&gt;We are seeing below errors in logs all the time&lt;BR /&gt;Node disconnected due to Proposed Authorizer is not inheritable by the Flow Controller because NiFi has already started the dataflow and Authorizer has differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty.&lt;/P&gt;&lt;P&gt;Failed to connect node to cluster because local flow controller partially updated. Administrator should disconnect node and review flow for corruption&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 10:17:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411565#M253084</guid>
      <dc:creator>Siva227</dc:creator>
      <dc:date>2025-07-16T10:17:34Z</dc:date>
    </item>
    <item>
      <title>Re: Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411689#M253125</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/128946"&gt;@Siva227&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The error you shared is when the node is trying to reconnect to the cluster following a disconnection.&amp;nbsp; So first we need to identify why the node disconnected originally.&amp;nbsp; I suspect you are disconnecting due to lack of heartbeat or your node failed to process a change request from the cluster coordinator node.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Cluster Size: 6 nodes&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Each Node: 32 vCPUs, 256 GB RAM&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;JVM Heap Memory: 192 GB (configured per node)&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Max Timer Driven Thread Count: 192&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Processor Count: Over 10,000 processors across the flows&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Any specific reason why you configured your NiFi to use so much heap memory.&amp;nbsp; Large heaps like this result in long stop-the-world Garbage Collections (GC).&amp;nbsp; &amp;nbsp;These long garbage collection stop-the -world events can lead to disconnections as a result of lack of heartbeat from that node.&amp;nbsp; A common mistake is setting heap very large simply becuase you have a lot of memory on the node.&amp;nbsp; You want to use the smallest heap possible as needed by yoru dataflows. GC does not kick in until heap usage reaches ~80%.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;The below property controls heartbeat interval and lack of heartbeat disconnection:&lt;BR /&gt;&amp;nbsp;&lt;SPAN&gt;nifi.cluster.protocol.heartbeat.interval=5 sec&lt;BR /&gt;&lt;BR /&gt;The cluster coordinator will disconnect a node due lack of heartbeat if a heartbeat has not been received for 8 times this configured value (40 seconds in this case).&amp;nbsp; &amp;nbsp;It is very possible you encounter GC that last longer then this.&amp;nbsp; I recommend changing your heartbeat interval to 30 sec which will allow up 4 mins of missed heartbeats before the cluster coordinator will disconnect a node.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The following error shared, while not initial cause of node disconnection, is preventing node from reconnecting:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Node disconnected due to Proposed Authorizer is not inheritable by the Flow Controller because NiFi has already started the dataflow and Authorizer has differences: Proposed Authorizations do not match current Authorizations: Proposed fingerprint is not inheritable because the current access policies is not empty.&lt;/LI-CODE&gt;&lt;P&gt;This implies that there are differences in the authorizations.xml file on this node versus what the cluster has in its authorizations.xml.&amp;nbsp; &amp;nbsp;You also state this is the error seen ver often after a node disconnection?&lt;BR /&gt;&lt;BR /&gt;Are you often modifying or setting up new authorization access policy when you have a node disconnect?&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I'd start with identifying the initial cause of node disconnection which I suspect is either lack of heartbeat or failed to replicate request to node resulting in node being disconnected.&amp;nbsp; Both of which can happen with long GC pauses.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jul 2025 13:44:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411689#M253125</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2025-07-22T13:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: Frequent Node Disconnects and Flow Synchronization Issues in NiFi 1.28.1 with Large Cluster</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411719#M253144</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/128946"&gt;@Siva227&lt;/a&gt;&amp;nbsp;Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jul 2025 20:38:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Frequent-Node-Disconnects-and-Flow-Synchronization-Issues-in/m-p/411719#M253144</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2025-07-24T20:38:14Z</dc:date>
    </item>
  </channel>
</rss>

