Support Questions

Find answers, ask questions, and share your expertise

How to ingest files based on the YYYYMMDD in their filename

avatar
New Contributor

What I want to do:

I'm finding a way to get the content of a file based on its filename.

All of the target files are in the same directory, but I'd like to select only the files which has "_{Today's YYYYMMDD}.tsv" as their postfix.

For example, if today is 20171113,

/same/dir/testfile_20171113.tsv -> OK. I'd like to ingest this file.

/same/dir/testfile2_20171113.tsv -> OK. This one is also a target.

/same/dir/testfile_20171114.tsv -> NG because this YYYYMMDD is not today.

/same/dir/testfile_2017111.tsv -> NG because the timestamp is not in the format of YYYYMMDD.

/same/dir/testfile_20171113.tsv.processed -> NG because the filename does not end with ".tsv".

/another/dir/testfile_20171113.tsv -> NG because this file is on another directory

What I have investigated:

I have gone through these docs and tried ListFile Processor and GetFile Processor,

https://community.hortonworks.com/questions/38120/how-to-get-files-based-on-the-time-stamp-in-nifi.h...

https://community.hortonworks.com/questions/39553/how-to-get-files-based-on-dates-in-nifi.html

I tried to input "[^\.].*_${now():format('yyyyMMdd')}\.tsv$" as "File Filter", but got an error which said "Not a valid Java Regular Expression".

As far as I checked, "File Filter" on both of ListFile and GetFile use StandardValidators.REGULAR_EXPRESSION_VALIDATOR as their validators, and unfortunately this validator does not interpret Nifi Expression Language.

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/pro...

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-utils/src/main/java/org/apache/nifi/pro...

Question

* Is there any way to inject an expression language into StandardValidators.REGULAR_EXPRESSION_VALIDATOR?

* If not, is there any other way to solve this problem?

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Guru

You could put a RouteOnAttribute after ListFile which would let your compare the filename to an expression language statement, and then route the ones that match to FetchFile and the unmatched ones to a dead-end processor, or auto-terminate.

View solution in original post

3 REPLIES 3

avatar
Master Guru

You could put a RouteOnAttribute after ListFile which would let your compare the filename to an expression language statement, and then route the ones that match to FetchFile and the unmatched ones to a dead-end processor, or auto-terminate.

avatar
New Contributor

Thanks for your quick reply and sorry for late.
I tried your suggestion and was successfully able to do what I want to do.

What I did:

43637-untitled.png

ListFile: Extract only the files whose name ends with ".tsv"

43638-untitled-2.png

UpdateAttribute: Substitute YYYYMMDD into an Attribute "today"

43640-untitled-3.png

RouteOnAttribute: Extract only the file which matches `${filename:matches(${today:prepend('[^\.].*_'):append('\d{6}.tsv$')})}` (where $filename is the target file name) and pass it as Relationship "target_file"

43641-untitled-4.png


untitled.png

avatar
Contributor

@shuhei_shogen  : I had similar used case but when i tried to reproduce the same approach its not working.My file name is like equity_asia2.dif.gz.20200324 and i want these files to be in target folder 20200324. In Update Attribute i have used ${filename:matches(${today:prepend('[^\.].*gz.'):append('\d{8}')})}. But it doesnt seem to be working. Could you please check and assist me on this.