<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Implement batched file processing in NiFi in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/399871#M250561</link>
    <description>&lt;P&gt;Hello,&amp;nbsp;&lt;BR /&gt;I am trying to implement file processing in batched manner using NiFi processors.&amp;nbsp;&lt;BR /&gt;My use case is, there are 70-80K files coming daily having size of 200-300 MB each.&lt;BR /&gt;Taking input those files from S3 store and sending those to spark execution by Livy processor. Plan is to not sending each file location to spark, instead we can batch few files and sending to spark by Livy, so livy connections to spark will get reduced.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Below are consideration while creating batch.&lt;BR /&gt;1. Batch Size: Batch will be based on size e.g. 1000MB&lt;BR /&gt;2. Wait duration: If there are not enough file to complete batch size then, batch will start after specific wait duration&lt;BR /&gt;&lt;BR /&gt;I am trying to implement this using wait, notity and updateAttribute(using stateful variables) based on batch size and wait time, but its not working fully.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Any leads/suggestions how to implement this would be much appreciable.&amp;nbsp;&lt;BR /&gt;Thanks.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 09 Jan 2025 10:42:20 GMT</pubDate>
    <dc:creator>askh88</dc:creator>
    <dc:date>2025-01-09T10:42:20Z</dc:date>
    <item>
      <title>Implement batched file processing in NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/399871#M250561</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;BR /&gt;I am trying to implement file processing in batched manner using NiFi processors.&amp;nbsp;&lt;BR /&gt;My use case is, there are 70-80K files coming daily having size of 200-300 MB each.&lt;BR /&gt;Taking input those files from S3 store and sending those to spark execution by Livy processor. Plan is to not sending each file location to spark, instead we can batch few files and sending to spark by Livy, so livy connections to spark will get reduced.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Below are consideration while creating batch.&lt;BR /&gt;1. Batch Size: Batch will be based on size e.g. 1000MB&lt;BR /&gt;2. Wait duration: If there are not enough file to complete batch size then, batch will start after specific wait duration&lt;BR /&gt;&lt;BR /&gt;I am trying to implement this using wait, notity and updateAttribute(using stateful variables) based on batch size and wait time, but its not working fully.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Any leads/suggestions how to implement this would be much appreciable.&amp;nbsp;&lt;BR /&gt;Thanks.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jan 2025 10:42:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/399871#M250561</guid>
      <dc:creator>askh88</dc:creator>
      <dc:date>2025-01-09T10:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: Implement batched file processing in NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/399887#M250564</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/123148"&gt;@askh88&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I don't know anything about the "livy processor" you are using, but NiFi processor typically execute against a single FlowFile at a time. So trying to use wait notify to delay FlowFiles reaching the livy processor until you have x number of FlowFiles of X total size range would likely not make much difference in controlling number of spark connections.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The question here is if it is possible to merge multiple FlowFiles in to one FlowFile that can be passed to your livy processor.&amp;nbsp; I don't know anything about structure of your data and if merge is possible via a mergeContent or MergeRecord processor. But if that Merging of FlowFiles is possible, that is the better route to take here.&lt;/P&gt;&lt;P&gt;Please help our community thrive. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jan 2025 15:28:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/399887#M250564</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2025-01-09T15:28:59Z</dc:date>
    </item>
    <item>
      <title>Re: Implement batched file processing in NiFi</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/400457#M250830</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;.&lt;BR /&gt;Checking with mergeContent option with our data volume, if it works then will go with this. Thanks for help/suggestion.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 06:46:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Implement-batched-file-processing-in-NiFi/m-p/400457#M250830</guid>
      <dc:creator>askh88</dc:creator>
      <dc:date>2025-01-16T06:46:28Z</dc:date>
    </item>
  </channel>
</rss>

