<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: NiFi Repository - Typical Disk Usage Ratios among the repositories in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122966#M22585</link>
    <description>&lt;P&gt;There is no direct correlation between the size of the content repository and the provenance repository.  The size the content repository will grow to is directly tied to the amount of unique content that is currently queued on the NiFi canvas.  If archive is enabled the amount of content repository space consumed will depend on the archive configuration settings in the nifi.properties file.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.content.repository.archive.max.retention.period=12 hours &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
nifi.content.repository.archive.max.usage.percentage=75%&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.content.repository.archive.enabled=true&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As you can see from the above archive will&lt;STRONG&gt;&lt;EM&gt; try&lt;/EM&gt;&lt;/STRONG&gt; to retain 12 hours of archived content (archived content being content that is no longer associated to an existing queued FlowFile on within any dataflow on the graph.  This does not guarantee that there will be any archive or that the content repository will not grow beyond 75% disk utilization.  Content still actively associated to queued FlowFiles will remain in the Content repository.  So it is important to build in back pressure in to dataflows where there is concern that large backlogs could trigger disk to fill to 100%.  Should Content repo fill to 100% corruption will not occur. New FlowFiles will not be able to be created until free space is available.  This is likely to produce a lot of errors in the flow (anywhere content is modified/written).&lt;/P&gt;&lt;P&gt;Provenance repository size is directly related to the number of FlowFiles and the number of event generating processors those events pass through on the NiFi canvas. In the case of disk utilization here, it is very controlled by setting in the nifi.properties file:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.provenance.repository.max.storage.time=7 days &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.provenance.repository.max.storage.size=50 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;With the above settings, NiFi will&lt;STRONG&gt;&lt;EM&gt; try &lt;/EM&gt;&lt;/STRONG&gt;to retain 7 days of provenance events on every FlowFile that it processes, but will start rolling off the oldest events once the max storage exceeds 50 GB.&lt;/P&gt;&lt;P&gt;It is important to understand that the 75% and 50GB are soft limits and should never be set to 100% or the exact size of the disk.&lt;/P&gt;&lt;P&gt;FlowFile Repository and database repository each remain relatively small.  The FlowFile repository is the &lt;EM&gt;most &lt;/EM&gt;important repo if all.  It should be isolated on a separate disk/partition that is not shared with any other process that may fill it. allowing the FlowFile repository disk to fill to 100% can lead to database corruption and lost data.  for a 200 GB Content repository, a ~25 GB FlowFile repo should be enough.  The database repository contains the user and change history DBs.  The user db will remain 0 bytes in size for NiFi instances running http (non-secure). For those instances running https (Secure), the user db will track all users who log in to the UI.  The change history db is tied to the little clock icon in the upper right corner NiFi tool bar.  It keeps track of all changes made on the NiFi graph/canvas.  It also stays relatively small.  A few GB of space should be plenty to store a considerable number of changes.&lt;/P&gt;</description>
    <pubDate>Wed, 16 Mar 2016 02:23:27 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2016-03-16T02:23:27Z</dc:date>
    <item>
      <title>NiFi Repository - Typical Disk Usage Ratios among the repositories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122965#M22584</link>
      <description>&lt;P&gt;Do we have any experience on typical disk usage rations for each of the repositories (Flow file, content, and provenance)? E.g. if Content requires 200 GB of storage, the provenance and flow file would require 20 GB (for typical flows)?&lt;/P&gt;&lt;P&gt;Trying to use this information to decide how best to slice of a NiFi server which has 12 local drives. E.g. 8 drives allocated for Content, 2 for flow file, and 2 for provenance.&lt;/P&gt;&lt;P&gt;Appreciate any thoughts!&lt;/P&gt;</description>
      <pubDate>Sat, 12 Mar 2016 00:44:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122965#M22584</guid>
      <dc:creator>wfloyd</dc:creator>
      <dc:date>2016-03-12T00:44:53Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi Repository - Typical Disk Usage Ratios among the repositories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122966#M22585</link>
      <description>&lt;P&gt;There is no direct correlation between the size of the content repository and the provenance repository.  The size the content repository will grow to is directly tied to the amount of unique content that is currently queued on the NiFi canvas.  If archive is enabled the amount of content repository space consumed will depend on the archive configuration settings in the nifi.properties file.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.content.repository.archive.max.retention.period=12 hours &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;
nifi.content.repository.archive.max.usage.percentage=75%&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.content.repository.archive.enabled=true&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As you can see from the above archive will&lt;STRONG&gt;&lt;EM&gt; try&lt;/EM&gt;&lt;/STRONG&gt; to retain 12 hours of archived content (archived content being content that is no longer associated to an existing queued FlowFile on within any dataflow on the graph.  This does not guarantee that there will be any archive or that the content repository will not grow beyond 75% disk utilization.  Content still actively associated to queued FlowFiles will remain in the Content repository.  So it is important to build in back pressure in to dataflows where there is concern that large backlogs could trigger disk to fill to 100%.  Should Content repo fill to 100% corruption will not occur. New FlowFiles will not be able to be created until free space is available.  This is likely to produce a lot of errors in the flow (anywhere content is modified/written).&lt;/P&gt;&lt;P&gt;Provenance repository size is directly related to the number of FlowFiles and the number of event generating processors those events pass through on the NiFi canvas. In the case of disk utilization here, it is very controlled by setting in the nifi.properties file:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.provenance.repository.max.storage.time=7 days &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.provenance.repository.max.storage.size=50 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;With the above settings, NiFi will&lt;STRONG&gt;&lt;EM&gt; try &lt;/EM&gt;&lt;/STRONG&gt;to retain 7 days of provenance events on every FlowFile that it processes, but will start rolling off the oldest events once the max storage exceeds 50 GB.&lt;/P&gt;&lt;P&gt;It is important to understand that the 75% and 50GB are soft limits and should never be set to 100% or the exact size of the disk.&lt;/P&gt;&lt;P&gt;FlowFile Repository and database repository each remain relatively small.  The FlowFile repository is the &lt;EM&gt;most &lt;/EM&gt;important repo if all.  It should be isolated on a separate disk/partition that is not shared with any other process that may fill it. allowing the FlowFile repository disk to fill to 100% can lead to database corruption and lost data.  for a 200 GB Content repository, a ~25 GB FlowFile repo should be enough.  The database repository contains the user and change history DBs.  The user db will remain 0 bytes in size for NiFi instances running http (non-secure). For those instances running https (Secure), the user db will track all users who log in to the UI.  The change history db is tied to the little clock icon in the upper right corner NiFi tool bar.  It keeps track of all changes made on the NiFi graph/canvas.  It also stays relatively small.  A few GB of space should be plenty to store a considerable number of changes.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Mar 2016 02:23:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122966#M22585</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2016-03-16T02:23:27Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi Repository - Typical Disk Usage Ratios among the repositories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122967#M22586</link>
      <description>&lt;P&gt;
	for you scenario with 12 disks (assuming all disk are 200 GB)&lt;/P&gt;&lt;P&gt;
	You can specify/define multiple Content repos and multiple Provenance repos; however, you can only define one FlowFile repository and one database repository.&lt;/P&gt;&lt;H3&gt;
	- 8 disks for Content repos:&lt;/H3&gt;&lt;P&gt;
	- &lt;STRONG&gt;/cont_repo1   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	- &lt;STRONG&gt;/cont_repo2   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo3   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo4   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo5   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo6   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo7   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /cont_repo8   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;H3&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;- 2 disks for Provenance repos:&lt;/H3&gt;&lt;P&gt;
	&lt;STRONG&gt;- /prov_repo1 &amp;lt;-- 200 GB &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;- /prov_repo2   &amp;lt;-- 200 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;H3&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;- 1 disk split into multiple partitions for:&lt;/H3&gt;&lt;P&gt;
	       -&lt;STRONG&gt; /var/log/nifi-logs/   &amp;lt;-- 100 GB&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
       -
	&lt;STRONG&gt; OS partitions  &amp;lt;-- split amongst other Standard OS (/tmp, /, etc...)&lt;/STRONG&gt;&lt;/P&gt;&lt;H3&gt;
	- 1 disk split into multiple partitions for:&lt;/H3&gt;&lt;P&gt;
	      -&lt;STRONG&gt; /opt/nifi     &amp;lt;-- 50 GB&lt;/STRONG&gt; &lt;/P&gt;&lt;P&gt;
	      - &lt;STRONG&gt;/flowfile_repo/   &amp;lt;-- 50 GB&lt;/STRONG&gt; &lt;/P&gt;&lt;P&gt;
	      -&lt;STRONG&gt; /database_repo/ &amp;lt;-- 25 GB&lt;/STRONG&gt; &lt;/P&gt;&lt;P&gt;
	      - &lt;STRONG&gt;/opt/configuration-resources  &amp;lt;-- 25 GB&lt;/STRONG&gt;  (this will hold any certs, config files, extras your NiFi processors/ dataflows may need).&lt;/P&gt;&lt;P style="margin-left: 20px;"&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Mar 2016 02:33:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/122967#M22586</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2016-03-16T02:33:55Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi Repository - Typical Disk Usage Ratios among the repositories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/404324#M22587</link>
      <description>&lt;P&gt;Is there any reason for which the provenance repository would not start deleting files once it has reached the maxStorage limit ?&lt;/P&gt;&lt;P&gt;In my case even though&amp;nbsp;&lt;STRONG&gt;nifi.provenance.repository.max.storage.size=10 GB, i&lt;/STRONG&gt;ts not deleting anything. But when checking the files in provenance, it seems its following the days limit as the oldest file is 7 days old. There are a lot of toc tmp files under the folder.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;nifi.provenance.repository.max.storage.time=7 days&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Anything I should look into for this ?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 14:55:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/404324#M22587</guid>
      <dc:creator>Scorpy257</dc:creator>
      <dc:date>2025-03-18T14:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi Repository - Typical Disk Usage Ratios among the repositories</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/404337#M22588</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/100654"&gt;@Scorpy257&lt;/a&gt;&amp;nbsp;As this is an older post, you would have a better chance of receiving a resolution by&lt;A href="“https://community.cloudera.com/t5/forums/postpage/board-id/Questions”" target="_blank"&gt; starting a new thread&lt;/A&gt;. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Mar 2025 18:22:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-Repository-Typical-Disk-Usage-Ratios-among-the/m-p/404337#M22588</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2025-03-18T18:22:23Z</dc:date>
    </item>
  </channel>
</rss>

