Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Need info on Regex to be used in NiFi ListFile processor

avatar
Explorer

Hi All,

I have directory structure like below on Unix file system,
/DIR1/DIR2/DIR3/DIR4/DIR5

 

I have files under the directories DIR3 & DIR5. However I want to read files only from DIR5.
I tried with various regex but could not get the right one

.

Could you please help me with the Regex to be used in ListFile processor to read files from DIR5 only.

 

Thanks in Advance!

 

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Rohitravi 

 

The Path Filter is applied against all subdirectories of the configured "Input Directory".  Any files found in the base "Input Directory" are still going to be listed.  If you had files in "dir4", they should not have been listed.

Is dir2 empty?  If so can you change your "Input Directory" to /dir1/dir2 instead of /dir1/dir2/dir3.

I cannot think of a reason why when filtering based on subdir path that you would still expect to returns from the base directory, so I filed an Apache jira ( https://issues.apache.org/jira/browse/NIFI-7104 ).

Another option is to add a RouteOnAttribute processor after your listFile processor to route on only FlowFile where the absolute.path FlowFile attribute included "dir5".

Screen Shot 2020-02-04 at 3.40.30 PM.png
Then auto-terminate the unmatched relationship and route the "dir5" relationship on to the next component in your dataflow.

Hope this helps,

Matt

If you found this solution resolves your query, please take a moment to click accept.

View solution in original post

4 REPLIES 4

avatar
Super Mentor

@Rohitravi 

 

If you only want to list files from directory DIR5, simply provide the complete path to DIR5 in the "Input Directory" ListFile processor configuration property and set Recurse Subdirectories as false.

 

If above is not an option, you may want to try using the following "Path Filter":

.*?/DIR5

 

Hope this helps,

Matt

avatar
Explorer

@MattWho 

Hi Matt,

Thanks for your response.

I tried the regex, but it picks the file which is present under DIR3 also. Below is my configuration,

 

ListFile.PNG

avatar
Super Mentor

@Rohitravi 

 

The Path Filter is applied against all subdirectories of the configured "Input Directory".  Any files found in the base "Input Directory" are still going to be listed.  If you had files in "dir4", they should not have been listed.

Is dir2 empty?  If so can you change your "Input Directory" to /dir1/dir2 instead of /dir1/dir2/dir3.

I cannot think of a reason why when filtering based on subdir path that you would still expect to returns from the base directory, so I filed an Apache jira ( https://issues.apache.org/jira/browse/NIFI-7104 ).

Another option is to add a RouteOnAttribute processor after your listFile processor to route on only FlowFile where the absolute.path FlowFile attribute included "dir5".

Screen Shot 2020-02-04 at 3.40.30 PM.png
Then auto-terminate the unmatched relationship and route the "dir5" relationship on to the next component in your dataflow.

Hope this helps,

Matt

If you found this solution resolves your query, please take a moment to click accept.

avatar
Explorer

Thanks a lot Matt!!