<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to use ListSFTP and FetchSFTP to filter lines of files in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323521#M229137</link>
    <description>&lt;P&gt;Hi Everyone, I use ListSFTP and FetchSFTP to collect the files that lines.&lt;BR /&gt;I want to filter the files based on the third field.&lt;BR /&gt;I want to collect the files that have the year 1995 only&amp;nbsp;in the lines.&lt;/P&gt;
&lt;P&gt;|226789|23-Feb-1996|1995|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|226780|08-Mar-1996|1996|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|222507|01-Jan-1995|1995|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|22308|01-Jan-1995|1995|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|222707|01-Jan-1995|1995|0|1|0|0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 02 Sep 2021 05:14:31 GMT</pubDate>
    <dc:creator>Justee</dc:creator>
    <dc:date>2021-09-02T05:14:31Z</dc:date>
    <item>
      <title>How to use ListSFTP and FetchSFTP to filter lines of files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323521#M229137</link>
      <description>&lt;P&gt;Hi Everyone, I use ListSFTP and FetchSFTP to collect the files that lines.&lt;BR /&gt;I want to filter the files based on the third field.&lt;BR /&gt;I want to collect the files that have the year 1995 only&amp;nbsp;in the lines.&lt;/P&gt;
&lt;P&gt;|226789|23-Feb-1996|1995|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|226780|08-Mar-1996|1996|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|222507|01-Jan-1995|1995|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|22308|01-Jan-1995|1995|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;
&lt;P&gt;|222707|01-Jan-1995|1995|0|1|0|0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Sep 2021 05:14:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323521#M229137</guid>
      <dc:creator>Justee</dc:creator>
      <dc:date>2021-09-02T05:14:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to use ListSFTP and FetchSFTP to filter lines of files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323528#M229141</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/88169"&gt;@Justee&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;ListSFTP only generate a FlowFile with attributes/metadata about the file on the SFTP processor.&amp;nbsp; It does not look at the content itself.&amp;nbsp; So your filtering options are limited to what is in those generated attributes.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_0-1630520544773.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/32205i6EAEA51709727489/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_0-1630520544773.png" alt="MattWho_0-1630520544773.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The FetchSFTP processor uses these attributes/metadata to retrieve the actual content and add it to the existing FlowFile produced by the ListSFTP processor.&lt;BR /&gt;&lt;BR /&gt;So unfortunately you would need to fetch the all files and then keep on those that contain the desired value in the third field.&amp;nbsp; You may want to look at the RouteText [1] processor for handling these Files after they are the content is fetched.&lt;BR /&gt;&lt;BR /&gt;[1] &lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.RouteText/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apache.nifi.processors.standard.RouteText/index.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;If you found this response addressed your query, please take a moment to login and click on "Accept as Solution" below this post.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Wed, 01 Sep 2021 18:47:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323528#M229141</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2021-09-01T18:47:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to use ListSFTP and FetchSFTP to filter lines of files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323567#M229155</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;What would be the regular expression if I have to put the selection condition on field three of the data.&lt;/P&gt;&lt;P&gt;the field I put in bold.&amp;nbsp;I want to select the lines with the 1995 only.&lt;/P&gt;&lt;P&gt;|226789|23-Feb-1996|&lt;STRONG&gt;1995&lt;/STRONG&gt;|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;&lt;P&gt;|226780|08-Mar-1996|&lt;STRONG&gt;1996&lt;/STRONG&gt;|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;&lt;P&gt;|222507|01-Jan-1995|&lt;STRONG&gt;1995&lt;/STRONG&gt;|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;&lt;P&gt;|22308|01-Jan-1995|&lt;STRONG&gt;1995&lt;/STRONG&gt;|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0&lt;/P&gt;&lt;P&gt;|222707|01-Jan-1995|&lt;STRONG&gt;1995&lt;/STRONG&gt;|0|1|0|0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0&lt;/P&gt;</description>
      <pubDate>Thu, 02 Sep 2021 09:28:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323567#M229155</guid>
      <dc:creator>Justee</dc:creator>
      <dc:date>2021-09-02T09:28:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to use ListSFTP and FetchSFTP to filter lines of files</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323605#M229165</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/88169"&gt;@Justee&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;First thing I would do is add a new Attribute on my FlowFile that specifies the year I'd be searching for in the lines contained within the content of that FlowFile. (optional)&lt;BR /&gt;For example adding an attribute "year" with a value of "1995".&lt;BR /&gt;&lt;BR /&gt;In the routeText processor, I'd then be able to use NiFi Expression Language (NEL) in my java regular expression as supported by this processor component:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;^\|(.*?)\|(.*?)\|${year}\|(.*?)$&lt;/LI-CODE&gt;&lt;P&gt;The above java regular expression will match on lines that begin with a pipe "|" followed by a non greedy wildcard match of one or more character until the very next pipe "|", then again for field 2, then for field three I used NEL which resolves to "1995", and then finally i match via wildcard the remainder of the line.&lt;BR /&gt;Of course you could simply put "1995" in place of "${year}" in the above regex.&lt;BR /&gt;&lt;BR /&gt;The routeText processor component configuration would look like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_0-1630599316390.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/32217i48D75CBFA76687A8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_0-1630599316390.png" alt="MattWho_0-1630599316390.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;The result would be two FlowFiles.&amp;nbsp; One FlowFile would be routed to the relationship "1995" (based on property name used) which would have content only containing lines with "1995".&amp;nbsp; The second FlowFile would route to the "unmatched" relationship and would contain all the non-matching lines ( you may to choose to just auto-terminate this relationship if you don't care about these lines).&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If you found these responses addressed your query, please take a moment to login and click on "Accept as Solution" below each response that helped you.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Sep 2021 16:18:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-use-ListSFTP-and-FetchSFTP-to-filter-lines-of-files/m-p/323605#M229165</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2021-09-02T16:18:50Z</dc:date>
    </item>
  </channel>
</rss>

