<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Process only one file at a time in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376462#M242923</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106206"&gt;@manishg&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The ListFile does not pickup any files.&amp;nbsp; It simply generates a zero content NiFI FlowFile for each file found in the target directory.&amp;nbsp; That FlowFile only has metadata about the target content.&amp;nbsp; The FetchFile processor utilizes that metadata to fetch that actual content and add it to the FlowFile.&amp;nbsp; The value added here happens when you have a lot target files to ingest.&amp;nbsp; To avoid having all the disk I/o related to that content on one node, you can redistribute the zero byte FlowFiles across all nodes so that each node now in a distributed way fetches the content (This works assuming that same target directory is mounted on all NiFi cluster nodes).&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;shared you could use Process Group (PG) FlowFile concurrency to accomplish the processing of one FlowFile at a time.&lt;BR /&gt;&lt;BR /&gt;The ListFile will still continue to list all FlowFiles in target directory (writes state and continues to list new files as they get added to input directory).&amp;nbsp; You can then feed the outbound connection of your ListFile to a PG configured with "Single FlowFile Per Node" FlowFile concurrency.&amp;nbsp; This will prevent any other FlowFile queued between ListFile and the PG to enter the PG until the first FlowFile has processed through that PG.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;So your first processor inside the PG would be your FetchFile processor.&amp;nbsp; &amp;nbsp;Now if you were to configure Load Balanced Connection on that connection between ListFile and the PG, You would end up with each node in your NiFi cluster processing a single File at a time.&amp;nbsp; This gives you some concurrency if you want it.&amp;nbsp; However, if you have a strict one file at a time, you would not configure load balanced connection.&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
    <pubDate>Mon, 18 Sep 2023 18:44:57 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2023-09-18T18:44:57Z</dc:date>
    <item>
      <title>Process only one file at a time</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376425#M242910</link>
      <description>&lt;P&gt;I am using ListFile processor to pick up input files to my flow. And it picks up all available files in the directory.&lt;/P&gt;&lt;P&gt;Is it possible to configure it to pick only one file, let its processing to be completed and then pick next file. So basically only one file at a time to be processed by system.&lt;/P&gt;&lt;P&gt;Or if any processor other than ListFile can be used for this purpose?&lt;/P&gt;</description>
      <pubDate>Sun, 17 Sep 2023 06:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376425#M242910</guid>
      <dc:creator>manishg</dc:creator>
      <dc:date>2023-09-17T06:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: Process only one file at a time</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376452#M242920</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106206"&gt;@manishg&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;This has been asked before in a different way but you can implement the same method:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/t5/Support-Questions/Wait-for-a-Flowfile-to-be-picked-only-after-the-previous/m-p/375890#M242696" target="_blank"&gt;https://community.cloudera.com/t5/Support-Questions/Wait-for-a-Flowfile-to-be-picked-only-after-the-previous/m-p/375890#M242696&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If that helps please accept solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Sep 2023 14:17:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376452#M242920</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2023-09-18T14:17:09Z</dc:date>
    </item>
    <item>
      <title>Re: Process only one file at a time</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376462#M242923</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106206"&gt;@manishg&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The ListFile does not pickup any files.&amp;nbsp; It simply generates a zero content NiFI FlowFile for each file found in the target directory.&amp;nbsp; That FlowFile only has metadata about the target content.&amp;nbsp; The FetchFile processor utilizes that metadata to fetch that actual content and add it to the FlowFile.&amp;nbsp; The value added here happens when you have a lot target files to ingest.&amp;nbsp; To avoid having all the disk I/o related to that content on one node, you can redistribute the zero byte FlowFiles across all nodes so that each node now in a distributed way fetches the content (This works assuming that same target directory is mounted on all NiFi cluster nodes).&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;As&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;shared you could use Process Group (PG) FlowFile concurrency to accomplish the processing of one FlowFile at a time.&lt;BR /&gt;&lt;BR /&gt;The ListFile will still continue to list all FlowFiles in target directory (writes state and continues to list new files as they get added to input directory).&amp;nbsp; You can then feed the outbound connection of your ListFile to a PG configured with "Single FlowFile Per Node" FlowFile concurrency.&amp;nbsp; This will prevent any other FlowFile queued between ListFile and the PG to enter the PG until the first FlowFile has processed through that PG.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;So your first processor inside the PG would be your FetchFile processor.&amp;nbsp; &amp;nbsp;Now if you were to configure Load Balanced Connection on that connection between ListFile and the PG, You would end up with each node in your NiFi cluster processing a single File at a time.&amp;nbsp; This gives you some concurrency if you want it.&amp;nbsp; However, if you have a strict one file at a time, you would not configure load balanced connection.&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Mon, 18 Sep 2023 18:44:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Process-only-one-file-at-a-time/m-p/376462#M242923</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2023-09-18T18:44:57Z</dc:date>
    </item>
  </channel>
</rss>

