<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Nifi Extract Text - Match on text and return the characters that follow in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287800#M213243</link>
    <description>&lt;P&gt;Very new to Nifi and regex.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a test txt file with a mock log file in.&lt;BR /&gt;Format along the lines of:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;srcip=10.10.10.10 timestamp=152532431 action="denied"&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I need is to match against the word and then return everything after the '=' until the next space character.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;</description>
    <pubDate>Thu, 16 Jan 2020 16:15:47 GMT</pubDate>
    <dc:creator>JamesE</dc:creator>
    <dc:date>2020-01-16T16:15:47Z</dc:date>
    <item>
      <title>Nifi Extract Text - Match on text and return the characters that follow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287800#M213243</link>
      <description>&lt;P&gt;Very new to Nifi and regex.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a test txt file with a mock log file in.&lt;BR /&gt;Format along the lines of:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;srcip=10.10.10.10 timestamp=152532431 action="denied"&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I need is to match against the word and then return everything after the '=' until the next space character.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 16:15:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287800#M213243</guid>
      <dc:creator>JamesE</dc:creator>
      <dc:date>2020-01-16T16:15:47Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Extract Text - Match on text and return the characters that follow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287807#M213248</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/73296"&gt;@JamesE&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The ExtractText processor is used to extract text from the content of the FlowFie using a Java Regular Expression and insert that extracted text in to FlowFile attributes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So using your FlowFile content example here:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;srcip=10.10.10.10 timestamp=152532431 action="denied"&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;What is your desired end result?&lt;BR /&gt;Three separate FlowFileAttributes? One for each "word" (srcip, timestamp, and action).&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Assuming above, you would add three new properties to the ExtractText processor (one for each extracted value) as follows:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2020-01-16 at 1.20.39 PM.png" style="width: 967px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/26065iBB3D2BB45F274CAE/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2020-01-16 at 1.20.39 PM.png" alt="Screen Shot 2020-01-16 at 1.20.39 PM.png" /&gt;&lt;/span&gt;&lt;BR /&gt;For each dynamic property added via the "+" icon, The property name becomes the FlowFile attribute name and the resulting string from capture group within the Java regular expression becomes the value assigned to that new FlowFile attribute.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 18:23:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287807#M213248</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2020-01-16T18:23:29Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Extract Text - Match on text and return the characters that follow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287826#M213261</link>
      <description>&lt;P&gt;Hi Matt&lt;/P&gt;&lt;P&gt;Thank you for the reply.&amp;nbsp;&lt;/P&gt;&lt;P&gt;That is what I am after however, say those three attributes were in a long list of say 25 attributes and I only wanted certain ones.&lt;/P&gt;&lt;P&gt;Would I have to list all of them like you have, to get the desired flowfile attributes out?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For instance:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;srcip=10.10.10.10 timestamp=152532431 action="denied" logver=12 tz="UTC+0" logid="0000012" dstip=12.12.12.12&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Say I wanted srcip, action and dstip but none of the others. Would I need to list each attribute within each new property?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Say would this not work and why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Annotation 2020-01-16 233049.png" style="width: 999px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/26072i742C8409E15413FF/image-size/large?v=v2&amp;amp;px=999" role="button" title="Annotation 2020-01-16 233049.png" alt="Annotation 2020-01-16 233049.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2020 23:36:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287826#M213261</guid>
      <dc:creator>JamesE</dc:creator>
      <dc:date>2020-01-16T23:36:26Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Extract Text - Match on text and return the characters that follow</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287889#M213304</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/73296"&gt;@JamesE&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can handle this easily using a different set of Java Regular Expressions:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;.*action=(.*?) .*
.*srcip=(.*?) .*
.*timestamp=(.*?) .*&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If it is possible that any one of these fields may be the very last field in the content line, for this to work you would need to append a blank space to the end of the content using the ReplaceText processor before sending your FlowFile to your ExtractText processor.&amp;nbsp; You need to have a blank space following each value so regex know where the value ends for each field.&lt;BR /&gt;&lt;BR /&gt;Your ReplaceText processor configuration would look like this:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2020-01-17 at 5.03.38 PM.png" style="width: 574px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/26079iD1900D13944CA4E4/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screen Shot 2020-01-17 at 5.03.38 PM.png" alt="Screen Shot 2020-01-17 at 5.03.38 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The "Replacement Value" is just a single space.&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 22:05:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Nifi-Extract-Text-Match-on-text-and-return-the-characters/m-p/287889#M213304</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2020-01-17T22:05:57Z</dc:date>
    </item>
  </channel>
</rss>

