@Brenigan
The ExtractText processor will support 1 to 40 capture groups in a Java regular expression.
The user added property defines the attribute in to which the value from capture group one will be placed.
The processor creates additional attribute by capture group number.
so in your case you added a new property with:
This is a single capture group which reads 4 digits.
So in you example (9999, text) this would result in creating attributes:
number = 9999 <-- alway contains value from capture group 1.
number.1 = 9999 <-- the ".1" signifies the capture group the value came from.
number.0 contains the entire matching java regular expression. This attribute is controlled by this property:
Setting to false will stop this one from being added to your FlowFiles.
To help understand this better, let's look at another example:
Suppose your java regular expression looked like this with 2 capture groups instead:
Also assume we had "Include Capture Group 0" set to "true"
Now with same source text of "9999, text", we would expect to see these attributes added:
number = 9999 <-- alway contains value from capture group 1.
number.0 = 9999, text <-- The complete match from the java regular expression.
number.1 = 9999 <-- The ".1" signifies the capture group the value came from
number.2 = text <-- the ".2" signifies the capture group the value came from.
Setting "false" for "Include Capture Group 0" would have resulted in "number.0" not being created; however, number, number.1, and number.2 would have still been created.
This functionality allows this processor component to handle multiple use cases.
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt