Support Questions

Find answers, ask questions, and share your expertise

Nifi Extract Text - Match on text and return the characters that follow

avatar
New Contributor

Very new to Nifi and regex.

 

I have a test txt file with a mock log file in.
Format along the lines of:

 

srcip=10.10.10.10 timestamp=152532431 action="denied"

 

What I need is to match against the word and then return everything after the '=' until the next space character.

 

Any help would be appreciated.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@JamesE 

 

You can handle this easily using a different set of Java Regular Expressions:

.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*

 

If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor.  You need to have a blank space following each value so regex know where the value ends for each field.

Your ReplaceText processor configuration would look like this:

Screen Shot 2020-01-17 at 5.03.38 PM.png

The "Replacement Value" is just a single space.

Hope this helps,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@JamesE 

The ExtractText processor is used to extract text from the content of the FlowFie using a Java Regular Expression and insert that extracted text in to FlowFile attributes.

 

So using your FlowFile content example here:

srcip=10.10.10.10 timestamp=152532431 action="denied"


What is your desired end result?
Three separate FlowFileAttributes? One for each "word" (srcip, timestamp, and action). 

Assuming above, you would add three new properties to the ExtractText processor (one for each extracted value) as follows:
Screen Shot 2020-01-16 at 1.20.39 PM.png
For each dynamic property added via the "+" icon, The property name becomes the FlowFile attribute name and the resulting string from capture group within the Java regular expression becomes the value assigned to that new FlowFile attribute.

 

Hope this helps,

Matt

avatar
New Contributor

Hi Matt

Thank you for the reply. 

That is what I am after however, say those three attributes were in a long list of say 25 attributes and I only wanted certain ones.

Would I have to list all of them like you have, to get the desired flowfile attributes out?

 

For instance:

 

 

srcip=10.10.10.10 timestamp=152532431 action="denied" logver=12 tz="UTC+0" logid="0000012" dstip=12.12.12.12

 

 

 

Say I wanted srcip, action and dstip but none of the others. Would I need to list each attribute within each new property?

 

Say would this not work and why?

 

Annotation 2020-01-16 233049.png

avatar
Master Mentor

@JamesE 

 

You can handle this easily using a different set of Java Regular Expressions:

.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*

 

If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor.  You need to have a blank space following each value so regex know where the value ends for each field.

Your ReplaceText processor configuration would look like this:

Screen Shot 2020-01-17 at 5.03.38 PM.png

The "Replacement Value" is just a single space.

Hope this helps,

Matt