Support Questions

JamesE · ‎01-16-2020

Very new to Nifi and regex.

I have a test txt file with a mock log file in.
Format along the lines of:

srcip=10.10.10.10 timestamp=152532431 action="denied"

What I need is to match against the word and then return everything after the '=' until the next space character.

Any help would be appreciated.

MattWho · ‎01-17-2020

@JamesE

You can handle this easily using a different set of Java Regular Expressions:

.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*

If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor. You need to have a blank space following each value so regex know where the value ends for each field.

Your ReplaceText processor configuration would look like this:

Screen Shot 2020-01-17 at 5.03.38 PM.png

The "Replacement Value" is just a single space.

Hope this helps,

Matt

View solution in original post

MattWho · ‎01-16-2020

@JamesE

The ExtractText processor is used to extract text from the content of the FlowFie using a Java Regular Expression and insert that extracted text in to FlowFile attributes.

So using your FlowFile content example here:

srcip=10.10.10.10 timestamp=152532431 action="denied"

What is your desired end result?
Three separate FlowFileAttributes? One for each "word" (srcip, timestamp, and action).

Assuming above, you would add three new properties to the ExtractText processor (one for each extracted value) as follows:
Screen Shot 2020-01-16 at 1.20.39 PM.png
For each dynamic property added via the "+" icon, The property name becomes the FlowFile attribute name and the resulting string from capture group within the Java regular expression becomes the value assigned to that new FlowFile attribute.

Hope this helps,

Matt

JamesE · ‎01-16-2020

Hi Matt

Thank you for the reply.

That is what I am after however, say those three attributes were in a long list of say 25 attributes and I only wanted certain ones.

Would I have to list all of them like you have, to get the desired flowfile attributes out?

For instance:

srcip=10.10.10.10 timestamp=152532431 action="denied" logver=12 tz="UTC+0" logid="0000012" dstip=12.12.12.12

Say I wanted srcip, action and dstip but none of the others. Would I need to list each attribute within each new property?

Say would this not work and why?

Annotation 2020-01-16 233049.png

MattWho · ‎01-17-2020

@JamesE

You can handle this easily using a different set of Java Regular Expressions:

.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*

If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor. You need to have a blank space following each value so regex know where the value ends for each field.

Your ReplaceText processor configuration would look like this:

Screen Shot 2020-01-17 at 5.03.38 PM.png

The "Replacement Value" is just a single space.

Hope this helps,

Matt

Cloudera Community

Support Questions

Nifi Extract Text - Match on text and return the characters that follow

Extracting data from unstructured logs text from m...

Extract text using Nifi

Extract text between two characters

Dataviz text search filter behaves strangely

How to use Apache NiFi EvaluateJsonPath for JSON t...

Spark Text Analytics - Uncovering Data-Driven Topi...

Counting lines in text files with NiFi - part 1

Removing text before and after [ ] characters

Counting lines in text files with NiFi - part 2

Extract a Part of a URL with extract text processo...