<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question NIFI 1.13.2 Cluster with Randomly Restarting Nodes in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316535#M226831</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;, thanks for your response.&amp;nbsp; Before I had heard from you I had checked /var/log/messages and searched for OOM messages without success.&amp;nbsp; However, there was no indications of nifi dying in nifi-app.log so I dug a little farther.&amp;nbsp; As it turned out I did find the OS killing the child process in NIFI.&amp;nbsp; I reduced the JVM to 24gb and the OS no longer feels the need to kill it.&amp;nbsp; So that problem is solved.&amp;nbsp; As for the JVM, we were attempting to keep as much of our flow queues in memory to reduce disk i/o.&amp;nbsp; Is there a better strategy?&lt;/P&gt;</description>
    <pubDate>Mon, 17 May 2021 13:03:36 GMT</pubDate>
    <dc:creator>Kilynn</dc:creator>
    <dc:date>2021-05-17T13:03:36Z</dc:date>
    <item>
      <title>NIFI 1.13.2 Cluster with Randomly Restarting Nodes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316430#M226800</link>
      <description>&lt;P&gt;NIFI 1.13.2 has nodes randomly restarting with connections refused.&lt;/P&gt;&lt;P&gt;cluster information&lt;/P&gt;&lt;P&gt;12 nodes 8 cores 32 gb ram&lt;/P&gt;&lt;H1&gt;bootstrap.conf&lt;/H1&gt;&lt;PRE&gt;java=java&lt;BR /&gt;run.as=&lt;BR /&gt;lib.dir=./lib&lt;BR /&gt;conf.dir=./conf&lt;BR /&gt;graceful.shutdown.seconds=20&lt;BR /&gt;java.arg.1=-Dorg.apache.jasper.compiler.disablejsr199=true&lt;BR /&gt;java.arg.2=-Xms28g&lt;BR /&gt;java.arg.3=-Xmx28gjava.arg.4=-Djava.net.preferIPv4Stack=true&lt;BR /&gt;java.arg.5=-Dsun.net.http.allowRestrictedHeaders=true&lt;BR /&gt;java.arg.6=-Djava.protocol.handler.pkgs=sun.net.www.protocol&lt;BR /&gt;java.arg.7=-XX:ReservedCodeCacheSize=256m&lt;BR /&gt;java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m&lt;BR /&gt;java.arg.9=-XX:+UseCodeCacheFlushingjava.arg.14=-Djava.awt.headless=true&lt;BR /&gt;nifi.bootstrap.sensitive.key=&lt;BR /&gt;java.arg.15=-Djava.security.egd=file:/dev/urandom&lt;BR /&gt;java.arg.16=-Djavax.security.auth.useSubjectCredsOnly=true&lt;BR /&gt;java.arg.17=-Dzookeeper.admin.enableServer=falsejava.arg.snappy=-Dorg.xerial.snappy.tempdir=/opt/nifi/tmp&lt;BR /&gt;notification.services.file=./conf/bootstrap-notification-services.xml&lt;BR /&gt;notification.max.attempts=5&lt;/PRE&gt;&lt;PRE&gt;# cluster node properties (only configure for cluster nodes) #&lt;BR /&gt;nifi.cluster.is.node=true&lt;BR /&gt;#nifi.cluster.node.address=ip-xx-xxx-xxx-xxx.us-gov-west-1.compute.internal&lt;BR /&gt;nifi.cluster.node.address=192.170.108.140&lt;BR /&gt;nifi.cluster.node.protocol.port=11443&lt;BR /&gt;nifi.cluster.node.protocol.threads=100&lt;BR /&gt;nifi.cluster.node.protocol.max.threads=800&lt;BR /&gt;nifi.cluster.node.event.history.size=25&lt;BR /&gt;nifi.cluster.node.connection.timeout=60 sec&lt;BR /&gt;nifi.cluster.node.read.timeout=60 sec&lt;BR /&gt;nifi.cluster.node.max.concurrent.requests=800&lt;BR /&gt;nifi.cluster.firewall.file=&lt;BR /&gt;nifi.cluster.flow.election.max.wait.time=5 mins&lt;BR /&gt;nifi.cluster.flow.election.max.candidates=7&lt;BR /&gt;&lt;BR /&gt;# cluster load balancing properties #&lt;BR /&gt;nifi.cluster.load.balance.host=192.170.108.140&lt;BR /&gt;nifi.cluster.load.balance.port=6342&lt;BR /&gt;nifi.cluster.load.balance.connections.per.node=50&lt;BR /&gt;nifi.cluster.load.balance.max.thread.count=600&lt;BR /&gt;nifi.cluster.load.balance.comms.timeout=45 sec&lt;BR /&gt;&lt;BR /&gt;# zookeeper properties, used for cluster management #&lt;BR /&gt;nifi.zookeeper.connect.string=192.170.108.37:2181,192.170.108.67:2181,192.170.108.120:2181,192.170.108.104:2181,192.170.108.106:2181&lt;BR /&gt;nifi.zookeeper.connect.timeout=30 secs&lt;BR /&gt;nifi.zookeeper.session.timeout=30 secs&lt;BR /&gt;nifi.zookeeper.root.node=/nifi_tf&lt;/PRE&gt;&lt;H1&gt;Sample Error&lt;/H1&gt;&lt;PRE&gt;2021-05-14 13:00:11,927 ERROR [Load-Balanced Client Thread-356] o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to nifi-tf-11.bogus-dns.pvt:9443 for load balancing&lt;BR /&gt;java.net.ConnectException: Connection refused&lt;BR /&gt;at sun.nio.ch.Net.connect0(Native Method)&lt;BR /&gt;at sun.nio.ch.Net.connect(Net.java:482)&lt;BR /&gt;at sun.nio.ch.Net.connect(Net.java:474)&lt;BR /&gt;at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)&lt;BR /&gt;at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:107)&lt;BR /&gt;at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)&lt;BR /&gt;at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.createChannel(NioAsyncLoadBalanceClient.java:456)&lt;BR /&gt;at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.establishConnection(NioAsyncLoadBalanceClient.java:399)&lt;BR /&gt;at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.communicate(NioAsyncLoadBalanceClient.java:211)&lt;BR /&gt;at org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:81)&lt;BR /&gt;at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)&lt;BR /&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)&lt;BR /&gt;at java.util.concurrent.FutureTask.run(FutureTask.java:266)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)&lt;BR /&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&lt;BR /&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&lt;BR /&gt;at java.lang.Thread.run(Thread.java:748)&lt;BR /&gt;2021-05-14 13:00:15,592 ERROR [Load-Balanced Client Thread-160] o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to nifi-tf-11.bogus-dns.pvt:9443 for load balancing&lt;/PRE&gt;&lt;H1&gt;&amp;nbsp;&lt;/H1&gt;&lt;H1&gt;System Diagnostics&lt;/H1&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Kilynn_3-1621009187973.png" style="width: 238px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/31179i7689418C8A57F220/image-size/large?v=v2&amp;amp;px=999" role="button" title="Kilynn_3-1621009187973.png" alt="Kilynn_3-1621009187973.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Kilynn_2-1621009162488.png" style="width: 229px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/31178iFFD6FE045395E8EC/image-size/large?v=v2&amp;amp;px=999" role="button" title="Kilynn_2-1621009162488.png" alt="Kilynn_2-1621009162488.png" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Kilynn_1-1621009146078.png" style="width: 299px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/31177iA11C5F352A0CE5B8/image-size/large?v=v2&amp;amp;px=999" role="button" title="Kilynn_1-1621009146078.png" alt="Kilynn_1-1621009146078.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 09:03:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316430#M226800</guid>
      <dc:creator>Kilynn</dc:creator>
      <dc:date>2026-04-21T09:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: NIFI 1.13.2 Cluster with Randomly Restarting Nodes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316435#M226801</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78936"&gt;@Kilynn&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Can you provide more detail around "&lt;SPAN&gt;nodes randomly restarting"?&lt;BR /&gt;What do you see in the nifi-app.log and nifi-bootstrap.log when the node(s) restart?&lt;BR /&gt;Do you see anything in /var/log/messages?&amp;nbsp; Maybe OOM Killer?&lt;BR /&gt;&lt;BR /&gt;I see Max heap shows 330 GB configured.&amp;nbsp; With 12 nodes that is ~27GB set per node.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;Any particular reason why you set heap so high for your NiFi nodes?&lt;BR /&gt;Larger heaps mean longer stop-the-world GC pauses.&lt;BR /&gt;&lt;BR /&gt;What version of Java are you using?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 May 2021 18:12:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316435#M226801</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2021-05-14T18:12:58Z</dc:date>
    </item>
    <item>
      <title>NIFI 1.13.2 Cluster with Randomly Restarting Nodes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316535#M226831</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;, thanks for your response.&amp;nbsp; Before I had heard from you I had checked /var/log/messages and searched for OOM messages without success.&amp;nbsp; However, there was no indications of nifi dying in nifi-app.log so I dug a little farther.&amp;nbsp; As it turned out I did find the OS killing the child process in NIFI.&amp;nbsp; I reduced the JVM to 24gb and the OS no longer feels the need to kill it.&amp;nbsp; So that problem is solved.&amp;nbsp; As for the JVM, we were attempting to keep as much of our flow queues in memory to reduce disk i/o.&amp;nbsp; Is there a better strategy?&lt;/P&gt;</description>
      <pubDate>Mon, 17 May 2021 13:03:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316535#M226831</guid>
      <dc:creator>Kilynn</dc:creator>
      <dc:date>2021-05-17T13:03:36Z</dc:date>
    </item>
    <item>
      <title>Re: NIFI 1.13.2 Cluster with Randomly Restarting Nodes</title>
      <link>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316811#M226901</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/78936"&gt;@Kilynn&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The following property in the nifi.properties file controls when a swap file is created per connection.&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&lt;STRONG&gt;nifi.queue.swap.threshold=20000&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;This is per connection and not for all FlowFiles across all connections.&amp;nbsp; A FlowFile swap file will always consist of 10000 FlowFiles.&lt;BR /&gt;So if a connection reaches 20000 queued FlowFiles, a swap file will be created for 10000 of those.&amp;nbsp; So if a connection queue reaches 40000, you would have 3 swap files of that connection.&lt;BR /&gt;&lt;BR /&gt;You can control individual connection queues by setting the "Back pressure Object Threshold" on a connection:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_0-1621516074494.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/31237iD0C9AD41E4671D9C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_0-1621516074494.png" alt="MattWho_0-1621516074494.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Note: Threshold settings are soft limits&lt;BR /&gt;And default for object threshold is 10000.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;So with these settings there should be very little to no swapping of FlowFiles to disk happening at all.&amp;nbsp; &amp;nbsp;Swap files would only happen if source processor to that connection output enough FlowFiles to connection at one time to trigger a swap file.&lt;BR /&gt;&lt;BR /&gt;For example:&lt;BR /&gt;- Connection has 9000 queued FlowFiles, so back pressure is not being applied.&lt;BR /&gt;- Source processor is thus allowed to execute&lt;BR /&gt;- Source processor upon execution produces 12000 FlowFiles&lt;BR /&gt;- now downstream connection has 21000 queued FlowFiles.&amp;nbsp; One swap file is produced and back pressure is enabled until queue drops back below 10000 queued FlowFiles.&lt;BR /&gt;&lt;BR /&gt;FlowFiles consist of two parts (FlowFile attributes/metadata and FlowFileContent).&amp;nbsp; The only portion of a FlowFile held in heap memory is the FlowFile attributes/Metadata.&amp;nbsp; FlowFile content is never held in memory (Some processors may load content in to memory in order to perform their function only).&lt;BR /&gt;&lt;BR /&gt;FlowFile attributes/metadata is persisted to the flowfile repository and FlowFile content is written to the content repository.&amp;nbsp; This important to avoid data loss if NiFi dies or is restarted while data still exists in connection queues.&lt;BR /&gt;&lt;BR /&gt;If you found this helped with your query, please take a moment to login and click accept in the solutions that helped.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Thu, 20 May 2021 13:17:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/NIFI-1-13-2-Cluster-with-Randomly-Restarting-Nodes/m-p/316811#M226901</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2021-05-20T13:17:30Z</dc:date>
    </item>
  </channel>
</rss>

