<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to ingest files based on the YYYYMMDD in their filename in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194640#M71343</link>
    <description>&lt;P&gt;&lt;STRONG&gt;What I want to do:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I'm finding a way to get the content of a file based on its filename.&lt;/P&gt;&lt;P&gt;All of the target files are in the same directory, but I'd like to select only the files which has "_{Today's YYYYMMDD}.tsv" as their postfix.&lt;/P&gt;&lt;P&gt;For example, if today is 20171113,&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171113.tsv -&amp;gt; OK. I'd like to ingest this file.&lt;/P&gt;&lt;P&gt;/same/dir/testfile2_20171113.tsv -&amp;gt; OK. This one is also a target.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171114.tsv -&amp;gt; NG because this YYYYMMDD is not today.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_2017111.tsv -&amp;gt; NG because the timestamp is not in the format of YYYYMMDD.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171113.tsv.processed -&amp;gt; NG because the filename does not end with ".tsv".&lt;/P&gt;&lt;P&gt;/another/dir/testfile_20171113.tsv -&amp;gt; NG because this file is on another directory&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I have investigated:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I have gone through these docs and tried ListFile Processor and GetFile Processor,&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.html" target="_blank"&gt;https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html" target="_blank"&gt;https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I tried to input "[^\.].*_${now():format('yyyyMMdd')}\.tsv$" as "File Filter", but got an error which said "Not a valid Java Regular Expression".&lt;/P&gt;&lt;P&gt;As far as I checked, "File Filter" on both of ListFile and GetFile use StandardValidators.REGULAR_EXPRESSION_VALIDATOR as their validators, and unfortunately this validator does not interpret Nifi Expression Language.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L363" target="_blank"&gt;https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L363&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L619" target="_blank"&gt;https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L619&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;* Is there any way to inject an expression language into StandardValidators.REGULAR_EXPRESSION_VALIDATOR?&lt;/P&gt;&lt;P&gt;* If not, is there any other way to solve this problem?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
    <pubDate>Tue, 14 Nov 2017 22:42:40 GMT</pubDate>
    <dc:creator>shuhei_shogen</dc:creator>
    <dc:date>2017-11-14T22:42:40Z</dc:date>
    <item>
      <title>How to ingest files based on the YYYYMMDD in their filename</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194640#M71343</link>
      <description>&lt;P&gt;&lt;STRONG&gt;What I want to do:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I'm finding a way to get the content of a file based on its filename.&lt;/P&gt;&lt;P&gt;All of the target files are in the same directory, but I'd like to select only the files which has "_{Today's YYYYMMDD}.tsv" as their postfix.&lt;/P&gt;&lt;P&gt;For example, if today is 20171113,&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171113.tsv -&amp;gt; OK. I'd like to ingest this file.&lt;/P&gt;&lt;P&gt;/same/dir/testfile2_20171113.tsv -&amp;gt; OK. This one is also a target.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171114.tsv -&amp;gt; NG because this YYYYMMDD is not today.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_2017111.tsv -&amp;gt; NG because the timestamp is not in the format of YYYYMMDD.&lt;/P&gt;&lt;P&gt;/same/dir/testfile_20171113.tsv.processed -&amp;gt; NG because the filename does not end with ".tsv".&lt;/P&gt;&lt;P&gt;/another/dir/testfile_20171113.tsv -&amp;gt; NG because this file is on another directory&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I have investigated:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I have gone through these docs and tried ListFile Processor and GetFile Processor,&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.html" target="_blank"&gt;https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html" target="_blank"&gt;https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I tried to input "[^\.].*_${now():format('yyyyMMdd')}\.tsv$" as "File Filter", but got an error which said "Not a valid Java Regular Expression".&lt;/P&gt;&lt;P&gt;As far as I checked, "File Filter" on both of ListFile and GetFile use StandardValidators.REGULAR_EXPRESSION_VALIDATOR as their validators, and unfortunately this validator does not interpret Nifi Expression Language.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L363" target="_blank"&gt;https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L363&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L619" target="_blank"&gt;https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/processor/util/StandardValidators.java#L619&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;* Is there any way to inject an expression language into StandardValidators.REGULAR_EXPRESSION_VALIDATOR?&lt;/P&gt;&lt;P&gt;* If not, is there any other way to solve this problem?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Nov 2017 22:42:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194640#M71343</guid>
      <dc:creator>shuhei_shogen</dc:creator>
      <dc:date>2017-11-14T22:42:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to ingest files based on the YYYYMMDD in their filename</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194641#M71344</link>
      <description>&lt;P&gt;You could put a RouteOnAttribute after ListFile which would let your compare the filename to an expression language statement, and then route the ones that match to FetchFile and the unmatched ones to a dead-end processor, or auto-terminate.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2017 03:16:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194641#M71344</guid>
      <dc:creator>bbende</dc:creator>
      <dc:date>2017-11-15T03:16:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to ingest files based on the YYYYMMDD in their filename</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194642#M71345</link>
      <description>&lt;P&gt;Thanks for your quick reply and sorry for late.&lt;BR /&gt;I tried your suggestion and was successfully able to do what I want to do.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What I did:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43637-untitled.png" style="width: 1181px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18070i0C3D49CCAD3CFCC3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43637-untitled.png" alt="43637-untitled.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;ListFile: Extract only the files whose name ends with ".tsv"&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43638-untitled-2.png" style="width: 957px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18071i8B5B331000027480/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43638-untitled-2.png" alt="43638-untitled-2.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;UpdateAttribute: Substitute YYYYMMDD into an Attribute "today"&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43640-untitled-3.png" style="width: 862px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18072i95A071797BB7EBA1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43640-untitled-3.png" alt="43640-untitled-3.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;RouteOnAttribute: Extract only the file which matches `${filename:matches(${today:prepend('[^\.].*_'):append('\d{6}.tsv$')})}` (where $filename is the target file name) and pass it as Relationship "target_file"&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="43641-untitled-4.png" style="width: 848px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/18073iCAAF6BF405D356B4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="43641-untitled-4.png" alt="43641-untitled-4.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://community.cloudera.com/t5/image/serverpage/image-id/7449i927C5BF9DD6F9E5C/image-size/large?v=1.0&amp;amp;px=999" border="0" alt="untitled.png" title="untitled.png" /&gt;</description>
      <pubDate>Sun, 18 Aug 2019 06:45:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/194642#M71345</guid>
      <dc:creator>shuhei_shogen</dc:creator>
      <dc:date>2019-08-18T06:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to ingest files based on the YYYYMMDD in their filename</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/292686#M71346</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/66776"&gt;@shuhei_shogen&lt;/a&gt;&amp;nbsp; : I had similar used case but when i tried to reproduce the same approach its not working.My file name is like&amp;nbsp;equity_asia2.dif.gz.20200324 and i want these files to be in target folder 20200324. In Update Attribute i have used&amp;nbsp;${filename:matches(${today:prepend('[^\.].*gz.'):append('\d{8}')})}. But it doesnt seem to be working. Could you please check and assist me on this.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2020 05:22:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-ingest-files-based-on-the-YYYYMMDD-in-their-filename/m-p/292686#M71346</guid>
      <dc:creator>Gubbi</dc:creator>
      <dc:date>2020-03-27T05:22:59Z</dc:date>
    </item>
  </channel>
</rss>

