<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can NIFI storage be extended? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Can-NIFI-storage-be-extended/m-p/306356#M222807</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/74983"&gt;@dzbeda&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Can you share a little more about your use case?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;NiFi does not expire data that is actively queued within connections between components added to the NiFi canvas.&amp;nbsp; &amp;nbsp;So I am a bit curious on the "&lt;SPAN&gt;I don't want to lose data" statement&amp;nbsp;you made.&lt;BR /&gt;&lt;BR /&gt;It is true that during times of "connectivity issues between the sites" that&amp;nbsp; NiFi FlowFile may accumulate within the connection queues resulting in more storage&amp;nbsp;being needed to hold that queued data while you wait for the connectivity to restore, but still not a concern for "data loss" unless your ingest is using some unconfirmed transfer protocol like UDP.&amp;nbsp; NiFi's Site-To-Site protocol used by the Remote Process Groups uses a two phase commit to avoid dataloss.&lt;BR /&gt;&lt;BR /&gt;Backpressure settings on each connection can control how many FlowFiles can queue before the component feeding FlowFiles into the connection is o longer allowed to execute. So in an extended outage or high volume, backpressure could end up being applied to all connection from last component in your dataflow to the first component in your dataflow.&amp;nbsp; &amp;nbsp;Default object thresholds are (10,000 FlowFiles or 1 GB of content size). Keep in mind these are soft limits.&amp;nbsp; Not advisable to simply set backpressure to some much larger value. I recommend reading following article:&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Dissecting-the-NiFi-quot-connection-quot-Heap-usage-and/ta-p/248166&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;As far as what happens when the content repo(s) (NiFi allows you to configure multiple content repos per NiFi instance) are full, NiFi simply can not generate any new content.&amp;nbsp; So any component that tries to create new content (can be at ingest or via some processor that modifies the content of an existing FlowFile) will simply fail went it tries to do so with an out of disk space exception. This does not mean dataloss (unless as I mentioned your ingest or egress uses an unconfirmed protocol).&amp;nbsp; The component will simply try again until it is successful once disk space becomes available (For example when connectivity returns and data can be pushed out).&lt;BR /&gt;&lt;BR /&gt;Using good protocols would result on data remaining on source once backpressure is applied all the way back to your ingest based components.&lt;BR /&gt;&lt;BR /&gt;NiFi archiving has nothing to do with how long FlowFiles are kept in NiFi's dataflow connections.&amp;nbsp; Archiving holds FlowFiles after they have successfully been removed (reached point of auto-termination in a dataflow.&amp;nbsp; &amp;nbsp;Archiving allows you to view old FlowFiles no longer queued or replay a FlowFiles from any point in your dataflow.&amp;nbsp; However, there is no bulk replay capability, so not useful for that.&lt;BR /&gt;&lt;A href="https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418" target="_blank"&gt;https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;BR /&gt;Matt&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 23 Nov 2020 13:34:11 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2020-11-23T13:34:11Z</dc:date>
  </channel>
</rss>

