<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Optimize NiFi Flow for Log-Based FlowFile Status Tracking in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Optimize-NiFi-Flow-for-Log-Based-FlowFile-Status-Tracking/m-p/406812#M252566</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/125920"&gt;@ajaykumardev32&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I would try to redesign your dataflow to avoid splitting the FlowFiles produced by tailFile processor.&amp;nbsp; NiFi FlowFile content is immutable (can not be modified once created).&amp;nbsp; Anytime the content of a FlowFile is modified, the new modified content is written to a new NiFi content claim.&amp;nbsp; If the processor has an "Original" relationship, an entirely new FlowFile is created (both metadata and content).&amp;nbsp; Those without "original" relationship that modify FlowFile content will simply update the existing FlowFile's metadata to point to new content claim.&amp;nbsp; So your SplitText processor is producing a lot of new FlowFiles. You then have inefficient thread usage downstream where processor are now executing against many small FlowFiles.&lt;BR /&gt;&lt;BR /&gt;As far as Provenance repository goes, you can configure the max amount of storage it can use before purging older provenance events.&amp;nbsp; Content and FlowFile repositories should not be on same disk since it is possible for content repository to fill the disk to 100%. You want to protect yoru FlowFile repository from filling to 100% by having it on a different physical or logical drive.&lt;BR /&gt;&lt;BR /&gt;Try utilizing the available "Record" based processors instead to avoid splitting FlowFiles and to do the record conversion/transform/modification.&amp;nbsp; In your case take a look at these record processors to see if they can be used in your use case:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.28.0/org.apache.nifi.processors.standard.UpdateRecord/index.html" target="component-usage"&gt;UpdateRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.28.0/org.apache.nifi.processors.standard.ConvertRecord/index.html" target="component-usage"&gt;ConvertRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-jolt-record-nar/1.28.0/org.apache.nifi.processors.jolt.record.JoltTransformRecord/index.html" target="component-usage"&gt;JoltTransformRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.28.0/org.apache.nifi.processors.script.ScriptedTransformRecord/index.html" target="component-usage"&gt;ScriptedTransformRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You are already using a "record" based processor to write to your destination DB.&lt;/P&gt;&lt;P&gt;Other strategies involve adjusting the "Max Timer Driven Thread" and&amp;nbsp; processor "concurrent tasks" settings.&amp;nbsp; &amp;nbsp; You'll need to carefully monitor cpu load average as you make incremental adjustments. If you max out your cpu, there is no gain from adjusting higher anymore.&amp;nbsp; Setting "Concurrent tasks" too high on any one processor can actually lead to worse performance overall in your dataflow.&amp;nbsp; So small increments and monitor is the proper path to optimization in this area.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 22 Apr 2025 12:34:26 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2025-04-22T12:34:26Z</dc:date>
    <item>
      <title>Optimize NiFi Flow for Log-Based FlowFile Status Tracking</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Optimize-NiFi-Flow-for-Log-Based-FlowFile-Status-Tracking/m-p/406803#M252563</link>
      <description>&lt;P class=""&gt;Hi Community,&lt;/P&gt;&lt;P class=""&gt;I'm working on a NiFi setup where I use a dedicated template to track the status of FlowFiles from various other templates. The status of each FlowFile is logged in a specific pattern, and I'm using this pattern to extract and persist status information.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ajaykumardev32_0-1745314520410.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/45136i4E287EFA18662254/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ajaykumardev32_0-1745314520410.png" alt="ajaykumardev32_0-1745314520410.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P class=""&gt;Here's a brief overview of the current approach:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;TailFile Processor&lt;/STRONG&gt; reads log entries from a specific log file.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;SplitText Processor&lt;/STRONG&gt; splits the log content line by line.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;ExtractGrok Processor&lt;/STRONG&gt; extracts relevant fields using a defined Grok pattern.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;ReplaceText Processor&lt;/STRONG&gt; restructures the data to a desired format (e.g., JSON).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;PutDatabaseRecord Processor&lt;/STRONG&gt; stores the structured data into a database.&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;H3&gt;Problems Faced:&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Queue Build-Up &amp;amp; Performance Bottleneck:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;TailFile often brings in large chunks of data, especially under high log volume.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;The SplitText processor cannot keep up with the rate of incoming data.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;This leads to large unprocessed FlowFiles piling up in the queue.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;FlowFile Explosion &amp;amp; Choking:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Once a large FlowFile is split, it results in a burst of many smaller FlowFiles.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;This sudden expansion causes congestion and chokes downstream processors.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Repository Storage Issues:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;The above behavior leads to excessive usage of the &lt;STRONG&gt;FlowFile Repository&lt;/STRONG&gt;, &lt;STRONG&gt;Content Repository&lt;/STRONG&gt;, and &lt;STRONG&gt;Provenance Repository&lt;/STRONG&gt;.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Over time, this is causing storage concerns and performance degradation&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;My Question:&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Is there a way to optimize this flow to:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Reduce the memory and storage pressure on NiFi repositories?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Handle incoming log data more efficiently without overwhelming the system?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Or, is there a better architectural pattern to achieve log-based FlowFile tracking across templates?&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;Any guidance or best practices would be greatly appreciated.&lt;/P&gt;&lt;P class=""&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 22 Apr 2025 09:40:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Optimize-NiFi-Flow-for-Log-Based-FlowFile-Status-Tracking/m-p/406803#M252563</guid>
      <dc:creator>ajaykumardev32</dc:creator>
      <dc:date>2025-04-22T09:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: Optimize NiFi Flow for Log-Based FlowFile Status Tracking</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Optimize-NiFi-Flow-for-Log-Based-FlowFile-Status-Tracking/m-p/406812#M252566</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/125920"&gt;@ajaykumardev32&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I would try to redesign your dataflow to avoid splitting the FlowFiles produced by tailFile processor.&amp;nbsp; NiFi FlowFile content is immutable (can not be modified once created).&amp;nbsp; Anytime the content of a FlowFile is modified, the new modified content is written to a new NiFi content claim.&amp;nbsp; If the processor has an "Original" relationship, an entirely new FlowFile is created (both metadata and content).&amp;nbsp; Those without "original" relationship that modify FlowFile content will simply update the existing FlowFile's metadata to point to new content claim.&amp;nbsp; So your SplitText processor is producing a lot of new FlowFiles. You then have inefficient thread usage downstream where processor are now executing against many small FlowFiles.&lt;BR /&gt;&lt;BR /&gt;As far as Provenance repository goes, you can configure the max amount of storage it can use before purging older provenance events.&amp;nbsp; Content and FlowFile repositories should not be on same disk since it is possible for content repository to fill the disk to 100%. You want to protect yoru FlowFile repository from filling to 100% by having it on a different physical or logical drive.&lt;BR /&gt;&lt;BR /&gt;Try utilizing the available "Record" based processors instead to avoid splitting FlowFiles and to do the record conversion/transform/modification.&amp;nbsp; In your case take a look at these record processors to see if they can be used in your use case:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.28.0/org.apache.nifi.processors.standard.UpdateRecord/index.html" target="component-usage"&gt;UpdateRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.28.0/org.apache.nifi.processors.standard.ConvertRecord/index.html" target="component-usage"&gt;ConvertRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-jolt-record-nar/1.28.0/org.apache.nifi.processors.jolt.record.JoltTransformRecord/index.html" target="component-usage"&gt;JoltTransformRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;A class="component-link" href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.28.0/org.apache.nifi.processors.script.ScriptedTransformRecord/index.html" target="component-usage"&gt;ScriptedTransformRecord&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="version"&gt;1.28.0&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You are already using a "record" based processor to write to your destination DB.&lt;/P&gt;&lt;P&gt;Other strategies involve adjusting the "Max Timer Driven Thread" and&amp;nbsp; processor "concurrent tasks" settings.&amp;nbsp; &amp;nbsp; You'll need to carefully monitor cpu load average as you make incremental adjustments. If you max out your cpu, there is no gain from adjusting higher anymore.&amp;nbsp; Setting "Concurrent tasks" too high on any one processor can actually lead to worse performance overall in your dataflow.&amp;nbsp; So small increments and monitor is the proper path to optimization in this area.&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Apr 2025 12:34:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Optimize-NiFi-Flow-for-Log-Based-FlowFile-Status-Tracking/m-p/406812#M252566</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2025-04-22T12:34:26Z</dc:date>
    </item>
  </channel>
</rss>

