Created 09-02-2024 03:19 AM
Hello, I have a simple use case. I have an incoming file having say user name and user address in each line. User name is from char 1 to char 5 and user address is from char 6 to char 15. How to use ExtractText processor for it. I tried using Search Value as ^(.{5})(.{10}) and in replacement value as $1,$2
Issue I am having is that for address it is capturing all the chars from 6th char to last and not necessarily upto 15th char. What should I modify?
Just for experimentation I tried doing ^(.{5})(.{10})(.{1}) and $1,$2,$3 and this is able to capture properly from 6th char to 15th char. Please help.
Created on 09-03-2024 12:36 AM - edited 09-03-2024 12:37 AM
I think you are confusing the ExtractText and the ReplaceText proessors. The ExtractText doesn't have Search Value & Replacement Value properties but the ReplaceText does. That is why I said post screenshot would be helpful because have I known that its replace Text my answer would have been different.
To get the desired result in this case , you need to specify the following pattern in the Search Value Property:
^(.{5})(.{10}).*
Basically you need to specify the full lline text that you want to replace with the matched group. When you stopped at "^(.{5})(.{10})" it meant that you only want to replace up to the 15th character of the full text with the result $1,$2 and that is why you were getting the reminder of the text. By adding ".*" at the end it will replace the whole line and not just up to the 15th character.
The final config will look like this
I hope that makes sense.
Created on 09-02-2024 04:53 PM - edited 09-02-2024 04:55 PM
Hi,
It could have been helpful if you were able to provide some examples regarding the different scenarios with what is expected vs what are you getting. Also providing screenshot of the processor\s in question can help making sure that you have the correct configuration to handle your case. One thing confusing to me is you dont mention anything about white spaces and if they count as a character in case of the name or the address or not.
Going with what you provided, if we assume we have the following line:
smithaddress123AAAA
where name expected to be: smith (1-5)
address: address123 (6-15)
I have configured the ExtractAddress processor as follows (basically adding new dynamic properties to define the extracted attributes):
The output flowfile will have the following attribute which what is expected:
The reason on why you are getting additional attributes with an index is because how the processor works in breaking up matching group. You can read more about this here.
If you find this helpful please accept the solution.
Thanks
Created on 09-03-2024 12:09 AM - edited 09-03-2024 12:48 AM
Hello @SAMSAL I have used ExtractText processor. This processor has an inbuilt property "Search Value" which I filled as ^(.{5})(.{10}) and in property "replacement value" as $1,$2
My Extract Processor conf is below
I also want to have whitespaces in my address. like if it is: smithAb Cd 12345678
then i want user name to be smit and address to be Ab Cd 1234.
Also I want to point out that I am basically constructing a comma separated flowfile by using this. The comma "," in $1,$2 makes them comma separated at the end.
The issue is that all works fine but this last $2 is not limiting to only 10 chars but taking chars after the 10 char also. so it ultimately becomes Ab Cd 12345678 instead of Ab Cd 1234
As I was speaking of some experimentation, I observe that if I do "Search Value" which I filled as ^(.{5})(.{10})(.{1}) and in property "replacement value" as $1,$2,$3 then I observe that both username and address comes proper as expected. now this $3 replaced value contains the extra until last characters.
Created on 09-03-2024 12:36 AM - edited 09-03-2024 12:37 AM
I think you are confusing the ExtractText and the ReplaceText proessors. The ExtractText doesn't have Search Value & Replacement Value properties but the ReplaceText does. That is why I said post screenshot would be helpful because have I known that its replace Text my answer would have been different.
To get the desired result in this case , you need to specify the following pattern in the Search Value Property:
^(.{5})(.{10}).*
Basically you need to specify the full lline text that you want to replace with the matched group. When you stopped at "^(.{5})(.{10})" it meant that you only want to replace up to the 15th character of the full text with the result $1,$2 and that is why you were getting the reminder of the text. By adding ".*" at the end it will replace the whole line and not just up to the 15th character.
The final config will look like this
I hope that makes sense.
Created 09-03-2024 11:09 PM
Thanks, it was ReplaceText processor and this regex really helped