<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question I am getting 3 attributes instead of one, using ExtractText Processor. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343306#M233896</link>
    <description>&lt;P&gt;Hi! So I am very confused about how regular expressions and groups work in nifi.&lt;/P&gt;&lt;P&gt;I read documentation and I saw that ExtractText processors always exctracts more attributes than needed somehow.&lt;/P&gt;&lt;P&gt;So I have this file with the line like this&lt;/P&gt;&lt;P&gt;9999, text&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And I wrote regular expression to extract&amp;nbsp; value 9999 for attribute call number.&amp;nbsp; (\d{4})&lt;/P&gt;&lt;P&gt;But instead of one attribute number I am getting number0, number and number1 attributes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please explain me why is this happening, because documentation explanation is quite complex really.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you beforehand!&lt;/P&gt;</description>
    <pubDate>Fri, 06 May 2022 08:13:55 GMT</pubDate>
    <dc:creator>Brenigan</dc:creator>
    <dc:date>2022-05-06T08:13:55Z</dc:date>
    <item>
      <title>I am getting 3 attributes instead of one, using ExtractText Processor.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343306#M233896</link>
      <description>&lt;P&gt;Hi! So I am very confused about how regular expressions and groups work in nifi.&lt;/P&gt;&lt;P&gt;I read documentation and I saw that ExtractText processors always exctracts more attributes than needed somehow.&lt;/P&gt;&lt;P&gt;So I have this file with the line like this&lt;/P&gt;&lt;P&gt;9999, text&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And I wrote regular expression to extract&amp;nbsp; value 9999 for attribute call number.&amp;nbsp; (\d{4})&lt;/P&gt;&lt;P&gt;But instead of one attribute number I am getting number0, number and number1 attributes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can someone please explain me why is this happening, because documentation explanation is quite complex really.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you beforehand!&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2022 08:13:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343306#M233896</guid>
      <dc:creator>Brenigan</dc:creator>
      <dc:date>2022-05-06T08:13:55Z</dc:date>
    </item>
    <item>
      <title>Re: I am getting 3 attributes instead of one, using ExtractText Processor.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343374#M233914</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/97722"&gt;@Brenigan&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The ExtractText processor will support 1 to 40 capture groups in a Java regular expression.&lt;BR /&gt;The user added property defines the attribute in to which the value from capture group one will be placed.&lt;BR /&gt;&lt;BR /&gt;The processor creates additional attribute by capture group number.&lt;BR /&gt;so in your case you added a new property with:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_0-1651860431777.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34331i85570CD425E89A38/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_0-1651860431777.png" alt="MattWho_0-1651860431777.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a single capture group which reads 4 digits.&lt;BR /&gt;So in you example (9999, text) this would result in creating attributes:&lt;BR /&gt;number = 9999 &amp;lt;-- alway contains value from capture group 1.&lt;BR /&gt;number.1 = 9999&amp;nbsp; &amp;lt;-- the ".1" signifies the capture group the value came from.&lt;BR /&gt;&lt;BR /&gt;number.0 contains the entire matching java regular expression.&amp;nbsp; This attribute is controlled by this property:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_1-1651860653088.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34332iA7A1C3A704F86251/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_1-1651860653088.png" alt="MattWho_1-1651860653088.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Setting to false will stop this one from being added to your FlowFiles.&lt;BR /&gt;&lt;BR /&gt;To help understand this better, let's look at another example:&lt;BR /&gt;Suppose your java regular expression looked like this with 2 capture groups instead:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_2-1651860803371.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/34333iEC51F833306CD6AE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MattWho_2-1651860803371.png" alt="MattWho_2-1651860803371.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Also assume we had "Include Capture Group 0" set to "true"&lt;BR /&gt;&lt;BR /&gt;Now with same source text of "9999, text", we would expect to see these attributes added:&lt;BR /&gt;number = 9999 &amp;lt;-- alway contains value from capture group 1.&lt;BR /&gt;number.0 = 9999, text&amp;nbsp; &amp;lt;-- The complete match from the java regular expression.&lt;/P&gt;&lt;P&gt;number.1 = 9999 &amp;lt;-- The ".1" signifies the capture group the value came from&lt;BR /&gt;number.2 = text&amp;nbsp; &amp;lt;-- the ".2" signifies the capture group the value came from.&lt;BR /&gt;&lt;BR /&gt;Setting "false" for "Include Capture Group 0" would have resulted in "number.0" not being created; however, number, number.1, and number.2 would have still been created.&lt;BR /&gt;&lt;BR /&gt;This functionality allows this processor component to handle multiple use cases.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If you found this response assisted with your query, please take a moment to login and click on "&lt;STRONG&gt;Accept as Solution&lt;/STRONG&gt;" below this post.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2022 18:20:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343374#M233914</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2022-05-06T18:20:31Z</dc:date>
    </item>
  </channel>
</rss>

