Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to use ListSFTP and FetchSFTP to filter lines of files

avatar
Explorer

Hi Everyone, I use ListSFTP and FetchSFTP to collect the files that lines.
I want to filter the files based on the third field.
I want to collect the files that have the year 1995 only in the lines.

|226789|23-Feb-1996|1995|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0

|226780|08-Mar-1996|1996|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0

|222507|01-Jan-1995|1995|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0

|22308|01-Jan-1995|1995|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0

|222707|01-Jan-1995|1995|0|1|0|0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Justee 

 

First thing I would do is add a new Attribute on my FlowFile that specifies the year I'd be searching for in the lines contained within the content of that FlowFile. (optional)
For example adding an attribute "year" with a value of "1995".

In the routeText processor, I'd then be able to use NiFi Expression Language (NEL) in my java regular expression as supported by this processor component:

^\|(.*?)\|(.*?)\|${year}\|(.*?)$

The above java regular expression will match on lines that begin with a pipe "|" followed by a non greedy wildcard match of one or more character until the very next pipe "|", then again for field 2, then for field three I used NEL which resolves to "1995", and then finally i match via wildcard the remainder of the line.
Of course you could simply put "1995" in place of "${year}" in the above regex.

The routeText processor component configuration would look like this:

MattWho_0-1630599316390.png

The result would be two FlowFiles.  One FlowFile would be routed to the relationship "1995" (based on property name used) which would have content only containing lines with "1995".  The second FlowFile would route to the "unmatched" relationship and would contain all the non-matching lines ( you may to choose to just auto-terminate this relationship if you don't care about these lines).

If you found these responses addressed your query, please take a moment to login and click on "Accept as Solution" below each response that helped you.

Thank you,

Matt

 

View solution in original post

3 REPLIES 3

avatar
Super Mentor

@Justee 

ListSFTP only generate a FlowFile with attributes/metadata about the file on the SFTP processor.  It does not look at the content itself.  So your filtering options are limited to what is in those generated attributes.

MattWho_0-1630520544773.png

 

The FetchSFTP processor uses these attributes/metadata to retrieve the actual content and add it to the existing FlowFile produced by the ListSFTP processor.

So unfortunately you would need to fetch the all files and then keep on those that contain the desired value in the third field.  You may want to look at the RouteText [1] processor for handling these Files after they are the content is fetched.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.14.0/org.apach...

If you found this response addressed your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

avatar
Explorer

Hi @MattWho 
What would be the regular expression if I have to put the selection condition on field three of the data.

the field I put in bold. I want to select the lines with the 1995 only.

|226789|23-Feb-1996|1995|0|1|1|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0

|226780|08-Mar-1996|1996|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0

|222507|01-Jan-1995|1995|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0

|22308|01-Jan-1995|1995|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0

|222707|01-Jan-1995|1995|0|1|0|0|0|0|0|0|1|0|0|0|0|0|0|0|1|0|0

avatar
Super Mentor

@Justee 

 

First thing I would do is add a new Attribute on my FlowFile that specifies the year I'd be searching for in the lines contained within the content of that FlowFile. (optional)
For example adding an attribute "year" with a value of "1995".

In the routeText processor, I'd then be able to use NiFi Expression Language (NEL) in my java regular expression as supported by this processor component:

^\|(.*?)\|(.*?)\|${year}\|(.*?)$

The above java regular expression will match on lines that begin with a pipe "|" followed by a non greedy wildcard match of one or more character until the very next pipe "|", then again for field 2, then for field three I used NEL which resolves to "1995", and then finally i match via wildcard the remainder of the line.
Of course you could simply put "1995" in place of "${year}" in the above regex.

The routeText processor component configuration would look like this:

MattWho_0-1630599316390.png

The result would be two FlowFiles.  One FlowFile would be routed to the relationship "1995" (based on property name used) which would have content only containing lines with "1995".  The second FlowFile would route to the "unmatched" relationship and would contain all the non-matching lines ( you may to choose to just auto-terminate this relationship if you don't care about these lines).

If you found these responses addressed your query, please take a moment to login and click on "Accept as Solution" below each response that helped you.

Thank you,

Matt