- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi Extract Text - Match on text and return the characters that follow
- Labels:
-
Apache NiFi
Created ‎01-16-2020 08:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very new to Nifi and regex.
I have a test txt file with a mock log file in.
Format along the lines of:
srcip=10.10.10.10 timestamp=152532431 action="denied"
What I need is to match against the word and then return everything after the '=' until the next space character.
Any help would be appreciated.
Created ‎01-17-2020 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can handle this easily using a different set of Java Regular Expressions:
.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*
If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor. You need to have a blank space following each value so regex know where the value ends for each field.
Your ReplaceText processor configuration would look like this:
The "Replacement Value" is just a single space.
Hope this helps,
Matt
Created ‎01-16-2020 10:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@JamesE
The ExtractText processor is used to extract text from the content of the FlowFie using a Java Regular Expression and insert that extracted text in to FlowFile attributes.
So using your FlowFile content example here:
srcip=10.10.10.10 timestamp=152532431 action="denied"
What is your desired end result?
Three separate FlowFileAttributes? One for each "word" (srcip, timestamp, and action).
Assuming above, you would add three new properties to the ExtractText processor (one for each extracted value) as follows:
For each dynamic property added via the "+" icon, The property name becomes the FlowFile attribute name and the resulting string from capture group within the Java regular expression becomes the value assigned to that new FlowFile attribute.
Hope this helps,
Matt
Created on ‎01-16-2020 03:19 PM - edited ‎01-16-2020 03:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matt
Thank you for the reply.
That is what I am after however, say those three attributes were in a long list of say 25 attributes and I only wanted certain ones.
Would I have to list all of them like you have, to get the desired flowfile attributes out?
For instance:
srcip=10.10.10.10 timestamp=152532431 action="denied" logver=12 tz="UTC+0" logid="0000012" dstip=12.12.12.12
Say I wanted srcip, action and dstip but none of the others. Would I need to list each attribute within each new property?
Say would this not work and why?
Created ‎01-17-2020 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can handle this easily using a different set of Java Regular Expressions:
.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*
If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor. You need to have a blank space following each value so regex know where the value ends for each field.
Your ReplaceText processor configuration would look like this:
The "Replacement Value" is just a single space.
Hope this helps,
Matt
