<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Extract char from a String in a flow file in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/392976#M248302</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;It could have been helpful if you were able to provide some examples regarding the different scenarios with what is expected vs what are you getting. Also providing screenshot of the processor\s in question can help making sure that you have the correct configuration to handle your case. One thing confusing to me is you dont mention anything about white spaces and if they count as a character in case of the name or the address or not.&lt;/P&gt;&lt;P&gt;Going with what you provided, if we assume we have the following line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;smithaddress123AAAA&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where name expected to be: smith (1-5)&lt;/P&gt;&lt;P&gt;address: &lt;SPAN&gt;address123&lt;/SPAN&gt; (6-15)&lt;/P&gt;&lt;P&gt;I have configured the ExtractAddress processor as follows (basically adding new dynamic properties to define the extracted attributes):&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1725321039379.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41630i33E1B0C96E059663/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1725321039379.png" alt="SAMSAL_0-1725321039379.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The output flowfile will have the following attribute which what is expected:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_1-1725321113423.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41631iD532E088C187513F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_1-1725321113423.png" alt="SAMSAL_1-1725321113423.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The reason on why you are getting additional attributes with an index is because how the processor works in breaking up matching group. You can read more about this &lt;A href="https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343306" target="_self"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;If you find this helpful please accept the solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 02 Sep 2024 23:55:08 GMT</pubDate>
    <dc:creator>SAMSAL</dc:creator>
    <dc:date>2024-09-02T23:55:08Z</dc:date>
    <item>
      <title>Extract char from a String in a flow file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/392959#M248295</link>
      <description>&lt;P&gt;Hello, I have a simple use case. I have an incoming file having say user name and user address in each line. User name is from char 1 to char 5 and user address is from char 6 to char 15. How to use ExtractText processor for it. I tried using Search Value as ^(.{5})(.{10}) and in replacement value as $1,$2&lt;/P&gt;&lt;P&gt;Issue I am having is that for address it is capturing all the chars from 6th char to last and not necessarily upto 15th char. What should I modify?&lt;BR /&gt;Just for experimentation I tried doing&amp;nbsp;^(.{5})(.{10})(.{1}) and&amp;nbsp;$1,$2,$3 and this is able to capture properly from 6th char to 15th char. Please help.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Sep 2024 10:19:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/392959#M248295</guid>
      <dc:creator>AlokKumar</dc:creator>
      <dc:date>2024-09-02T10:19:21Z</dc:date>
    </item>
    <item>
      <title>Re: Extract char from a String in a flow file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/392976#M248302</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;It could have been helpful if you were able to provide some examples regarding the different scenarios with what is expected vs what are you getting. Also providing screenshot of the processor\s in question can help making sure that you have the correct configuration to handle your case. One thing confusing to me is you dont mention anything about white spaces and if they count as a character in case of the name or the address or not.&lt;/P&gt;&lt;P&gt;Going with what you provided, if we assume we have the following line:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;smithaddress123AAAA&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where name expected to be: smith (1-5)&lt;/P&gt;&lt;P&gt;address: &lt;SPAN&gt;address123&lt;/SPAN&gt; (6-15)&lt;/P&gt;&lt;P&gt;I have configured the ExtractAddress processor as follows (basically adding new dynamic properties to define the extracted attributes):&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1725321039379.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41630i33E1B0C96E059663/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1725321039379.png" alt="SAMSAL_0-1725321039379.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The output flowfile will have the following attribute which what is expected:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_1-1725321113423.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41631iD532E088C187513F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_1-1725321113423.png" alt="SAMSAL_1-1725321113423.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The reason on why you are getting additional attributes with an index is because how the processor works in breaking up matching group. You can read more about this &lt;A href="https://community.cloudera.com/t5/Support-Questions/I-am-getting-3-attributes-instead-of-one-using-ExtractText/m-p/343306" target="_self"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;If you find this helpful please accept the solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Sep 2024 23:55:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/392976#M248302</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-09-02T23:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: Extract char from a String in a flow file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393000#M248306</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;I have used ExtractText processor. This processor has an inbuilt property&lt;SPAN&gt;&amp;nbsp;"Search Value" which I filled as ^(.{5})(.{10}) and in property "replacement value" as $1,$2&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My Extract Processor conf is below&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="AlokKumar_0-1725347551807.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41640i5012E551CEFD0CC2/image-size/medium?v=v2&amp;amp;px=400" role="button" title="AlokKumar_0-1725347551807.png" alt="AlokKumar_0-1725347551807.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I also want to have whitespaces in my address. like if it is: smithAb Cd 12345678&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;then i want user name to be smit and address to be Ab Cd 1234.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Also I want to point out that I am basically constructing a comma separated flowfile by using this. The comma "," in $1,$2 makes them comma separated at the end.&lt;BR /&gt;The issue is that all works fine but this last $2 is not limiting to only 10 chars but taking chars after the 10 char also. so it ultimately becomes&amp;nbsp;Ab Cd 12345678 instead of&amp;nbsp;Ab Cd 1234&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;As I was speaking of some experimentation, I observe that if I do&amp;nbsp;&amp;nbsp;"Search Value" which I filled as ^(.{5})(.{10})(.{1}) and in property "replacement value" as $1,$2,$3&amp;nbsp; then I observe that both username and address comes proper as expected. now this $3 replaced value contains the extra until last characters.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 07:48:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393000#M248306</guid>
      <dc:creator>AlokKumar</dc:creator>
      <dc:date>2024-09-03T07:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Extract char from a String in a flow file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393004#M248307</link>
      <description>&lt;P&gt;I think you are confusing the &lt;STRONG&gt;ExtractText&lt;/STRONG&gt; and the &lt;STRONG&gt;ReplaceText&lt;/STRONG&gt; proessors. The ExtractText doesn't have Search Value &amp;amp; Replacement Value properties but the ReplaceText does. That is why I said post screenshot would be helpful because have I known that its replace Text my answer would have been different.&lt;/P&gt;&lt;P&gt;To get the desired result in this case , you need to specify the following pattern in the Search Value Property:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;^(.{5})(.{10}).*&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Basically you need to specify the full lline text that you want to replace with the matched group. When you stopped at "^(.{5})(.{10})" it meant that you only want to replace up to the 15th character of the full text with the result $1,$2 and that is why you were getting the reminder of the text. By adding ".*" at the end it will replace the whole line and not just up to the 15th character.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The final config will look like this&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1725348697318.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/41641i213ED0B7359BA58D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1725348697318.png" alt="SAMSAL_0-1725348697318.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I hope that makes sense.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 07:37:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393004#M248307</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2024-09-03T07:37:19Z</dc:date>
    </item>
    <item>
      <title>Re: Extract char from a String in a flow file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393028#M248314</link>
      <description>&lt;P&gt;Thanks, it was ReplaceText processor and this regex really helped&lt;/P&gt;</description>
      <pubDate>Wed, 04 Sep 2024 06:09:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Extract-char-from-a-String-in-a-flow-file/m-p/393028#M248314</guid>
      <dc:creator>AlokKumar</dc:creator>
      <dc:date>2024-09-04T06:09:59Z</dc:date>
    </item>
  </channel>
</rss>

