Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Expression language to check if the file contains certain text.

avatar
Expert Contributor

I have a sample text file, it contains text "Such as this is a check text", I need to write a NiFi expression in RouteText file processor to check if the sample text file contains string 'check' ingest other wise reject.

What I could find was everything is with filename, not what is inside the file.

${filename:toUpper():contains('check')}

1 ACCEPTED SOLUTION

avatar
Master Mentor

@dhieru singh

The NiFi expression Language is used to evaluate and operate against the attributes of a FlowFile, Variable Registry key/value pair, a NiFi JVM pre-defined property, or a pre-defined system environment variable. What you are trying to operate against is the content of a FlowFile. Processor like RouteText and RouteOnContent as mentioned by @kdoran are the correct processor to use in this scenario. These processors expect you to create custom new properties that use Java Regular expressions instead of the NiFi Expression Language to parse against the content of a FlowFile.

Using your example and a RouteOnContent processor, you might want to add a new property as follows:

34573-screen-shot-2017-08-29-at-30707-pm.png

.*?([Cc][Hh][Ee][Cc][Kk]).*?

The Java regular expression looks at the content for 0 or more characters, followed by "check" (case incentive), followed by 0 or more characters.

The RouteOnContent will then have 2 relationships: "containsCheck" (user added above) and "unmatched" (default: always exists)

Any FlowFiles with content not containing check (case incentive) will be routed to unmatched. You can choose to auto-terminate this relationship if you just want to throw these unmatched FlowFiles away.

Thanks,

Matt

View solution in original post

3 REPLIES 3

avatar
Contributor

From the RouteText processor documentation:

Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually.

Is this what you want to accomplish, i.e., routing each line of text? If so, then you don't have to use the NiFi Expression language. Use a Java Regex Pattern string and it will be evaluated for each line in your text file.

If instead you wish to route the entire FlowFile if any line matches, there are probably many ways you could do that. One method would be to use RouteOnContent:

Applies Regular Expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose Regular Expression matches. Regular Expressions are added as User-Defined Properties where the name of the property is the name of the relationship and the value is a Regular Expression to match against the FlowFile content.

Again, this uses a Java Regex Pattern String.

Does this answer your question?

avatar
Expert Contributor

@kdoran Thanks a lot that makes sense it worked

avatar
Master Mentor

@dhieru singh

The NiFi expression Language is used to evaluate and operate against the attributes of a FlowFile, Variable Registry key/value pair, a NiFi JVM pre-defined property, or a pre-defined system environment variable. What you are trying to operate against is the content of a FlowFile. Processor like RouteText and RouteOnContent as mentioned by @kdoran are the correct processor to use in this scenario. These processors expect you to create custom new properties that use Java Regular expressions instead of the NiFi Expression Language to parse against the content of a FlowFile.

Using your example and a RouteOnContent processor, you might want to add a new property as follows:

34573-screen-shot-2017-08-29-at-30707-pm.png

.*?([Cc][Hh][Ee][Cc][Kk]).*?

The Java regular expression looks at the content for 0 or more characters, followed by "check" (case incentive), followed by 0 or more characters.

The RouteOnContent will then have 2 relationships: "containsCheck" (user added above) and "unmatched" (default: always exists)

Any FlowFiles with content not containing check (case incentive) will be routed to unmatched. You can choose to auto-terminate this relationship if you just want to throw these unmatched FlowFiles away.

Thanks,

Matt