- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
I am getting 3 attributes instead of one, using ExtractText Processor.
- Labels:
-
Apache NiFi
Created ‎05-06-2022 01:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi! So I am very confused about how regular expressions and groups work in nifi.
I read documentation and I saw that ExtractText processors always exctracts more attributes than needed somehow.
So I have this file with the line like this
9999, text
And I wrote regular expression to extract value 9999 for attribute call number. (\d{4})
But instead of one attribute number I am getting number0, number and number1 attributes.
Can someone please explain me why is this happening, because documentation explanation is quite complex really.
Thank you beforehand!
Created ‎05-06-2022 11:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Brenigan
The ExtractText processor will support 1 to 40 capture groups in a Java regular expression.
The user added property defines the attribute in to which the value from capture group one will be placed.
The processor creates additional attribute by capture group number.
so in your case you added a new property with:
This is a single capture group which reads 4 digits.
So in you example (9999, text) this would result in creating attributes:
number = 9999 <-- alway contains value from capture group 1.
number.1 = 9999 <-- the ".1" signifies the capture group the value came from.
number.0 contains the entire matching java regular expression. This attribute is controlled by this property:
Setting to false will stop this one from being added to your FlowFiles.
To help understand this better, let's look at another example:
Suppose your java regular expression looked like this with 2 capture groups instead:
Also assume we had "Include Capture Group 0" set to "true"
Now with same source text of "9999, text", we would expect to see these attributes added:
number = 9999 <-- alway contains value from capture group 1.
number.0 = 9999, text <-- The complete match from the java regular expression.
number.1 = 9999 <-- The ".1" signifies the capture group the value came from
number.2 = text <-- the ".2" signifies the capture group the value came from.
Setting "false" for "Include Capture Group 0" would have resulted in "number.0" not being created; however, number, number.1, and number.2 would have still been created.
This functionality allows this processor component to handle multiple use cases.
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
