Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to ingest files based on the YYYYMMDD in their filename

avatar
New Contributor

What I want to do:

I'm finding a way to get the content of a file based on its filename.

All of the target files are in the same directory, but I'd like to select only the files which has "_{Today's YYYYMMDD}.tsv" as their postfix.

For example, if today is 20171113,

/same/dir/testfile_20171113.tsv -> OK. I'd like to ingest this file.

/same/dir/testfile2_20171113.tsv -> OK. This one is also a target.

/same/dir/testfile_20171114.tsv -> NG because this YYYYMMDD is not today.

/same/dir/testfile_2017111.tsv -> NG because the timestamp is not in the format of YYYYMMDD.

/same/dir/testfile_20171113.tsv.processed -> NG because the filename does not end with ".tsv".

/another/dir/testfile_20171113.tsv -> NG because this file is on another directory

What I have investigated:

I have gone through these docs and tried ListFile Processor and GetFile Processor,

https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.h...

https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html

I tried to input "[^\.].*_${now():format('yyyyMMdd')}\.tsv$" as "File Filter", but got an error which said "Not a valid Java Regular Expression".

As far as I checked, "File Filter" on both of ListFile and GetFile use StandardValidators.REGULAR_EXPRESSION_VALIDATOR as their validators, and unfortunately this validator does not interpret Nifi Expression Language.

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/pro...

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/pro...

Question

* Is there any way to inject an expression language into StandardValidators.REGULAR_EXPRESSION_VALIDATOR?

* If not, is there any other way to solve this problem?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Guru

You could put a RouteOnAttribute after ListFile which would let your compare the filename to an expression language statement, and then route the ones that match to FetchFile and the unmatched ones to a dead-end processor, or auto-terminate.

View solution in original post

3 REPLIES 3

avatar
Master Guru

You could put a RouteOnAttribute after ListFile which would let your compare the filename to an expression language statement, and then route the ones that match to FetchFile and the unmatched ones to a dead-end processor, or auto-terminate.

avatar
New Contributor

Thanks for your quick reply and sorry for late.
I tried your suggestion and was successfully able to do what I want to do.

What I did:

43637-untitled.png

ListFile: Extract only the files whose name ends with ".tsv"

43638-untitled-2.png

UpdateAttribute: Substitute YYYYMMDD into an Attribute "today"

43640-untitled-3.png

RouteOnAttribute: Extract only the file which matches `${filename:matches(${today:prepend('[^\.].*_'):append('\d{6}.tsv$')})}` (where $filename is the target file name) and pass it as Relationship "target_file"

43641-untitled-4.png


untitled.png

avatar
Contributor

@shuhei_shogen  : I had similar used case but when i tried to reproduce the same approach its not working.My file name is like equity_asia2.dif.gz.20200324 and i want these files to be in target folder 20200324. In Update Attribute i have used ${filename:matches(${today:prepend('[^\.].*gz.'):append('\d{8}')})}. But it doesnt seem to be working. Could you please check and assist me on this.