<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Using regular expressions to fetch all files having .txt  in nifi. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284962#M211543</link>
    <description>&lt;P&gt;Thank u Matt. Its working.&lt;/P&gt;</description>
    <pubDate>Fri, 06 Dec 2019 11:01:11 GMT</pubDate>
    <dc:creator>sunilb</dc:creator>
    <dc:date>2019-12-06T11:01:11Z</dc:date>
    <item>
      <title>Using regular expressions to fetch all files having .txt  in nifi.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284868#M211486</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am trying to fetch all files with .txt extension from a list of files present in s3 bucket using nifi.&lt;/P&gt;
&lt;P&gt;Is there any way to fetch the file based on the format of file and what processors we will use here.&lt;/P&gt;
&lt;P&gt;Can any one explain me with an example as i am new to this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Sunil&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2019 14:01:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284868#M211486</guid>
      <dc:creator>sunilb</dc:creator>
      <dc:date>2019-12-05T14:01:09Z</dc:date>
    </item>
    <item>
      <title>Re: Using regular expressions to fetch all files having .txt  in nifi.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284896#M211498</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/72006"&gt;@sunilb&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You may want to look at using the listS3 processor to list the files from your S3 bucket.&amp;nbsp; This will produce one 0 byte (actual file content is not retrieved by this processor) FlowFile for each S3 file that is listed.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;Each of these generated FlowFile will have attributes/metadata about the file that was listed.&amp;nbsp; This includes the "filename".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can then route the success relationship from the listS3 processor to a RouteOnAttribute processor where you route those FlowFiles where the "filename" attribute value ends with ".txt" on to a FetchS3Object processor (This processor uses the "filename" attribute from the inbound FlowFile to fetch the actual content for that S3 file and add it to the FlowFile).&amp;nbsp; &amp;nbsp;Any FlowFile where the filename attribute does not end in ".txt" could just be auto-terminated.&lt;BR /&gt;&lt;BR /&gt;RouteOnAttribute configuration:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-05 at 10.15.07 AM.png" style="width: 588px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25625i611FDA79EE228539/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2019-12-05 at 10.15.07 AM.png" alt="Screen Shot 2019-12-05 at 10.15.07 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-05 at 10.16.40 AM.png" style="width: 968px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25626iB90981616B34810B/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2019-12-05 at 10.16.40 AM.png" alt="Screen Shot 2019-12-05 at 10.16.40 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is an example of what this portion of the dataflow would look like:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2019-12-05 at 10.18.39 AM.png" style="width: 473px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/25627iD1D97577C8C16558/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2019-12-05 at 10.18.39 AM.png" alt="Screen Shot 2019-12-05 at 10.18.39 AM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The connection between RouteOnAttribute and FetchS3Object processors should be configured to use the Round Robin Load Balancing Strategy if your NiFi is setup as a cluster.&amp;nbsp; The ListS3&amp;nbsp; processor should only be configured to run on the NiFi cluster's primary node (you'll notice the mall "P" on the icon of the listS3 processor in upper left corner).&amp;nbsp; So the load balancing strategy will redistribute the listed FlowFiles amongst all nodes in your cluster before actually fetching the content for more efficient/performant use of resources.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2019 15:23:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284896#M211498</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2019-12-05T15:23:48Z</dc:date>
    </item>
    <item>
      <title>Re: Using regular expressions to fetch all files having .txt  in nifi.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284962#M211543</link>
      <description>&lt;P&gt;Thank u Matt. Its working.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2019 11:01:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-regular-expressions-to-fetch-all-files-having-txt-in/m-p/284962#M211543</guid>
      <dc:creator>sunilb</dc:creator>
      <dc:date>2019-12-06T11:01:11Z</dc:date>
    </item>
  </channel>
</rss>

