Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Extract char from a String in a flow file

avatar
Contributor

Hello, I have a simple use case. I have an incoming file having say user name and user address in each line. User name is from char 1 to char 5 and user address is from char 6 to char 15. How to use ExtractText processor for it. I tried using Search Value as ^(.{5})(.{10}) and in replacement value as $1,$2

Issue I am having is that for address it is capturing all the chars from 6th char to last and not necessarily upto 15th char. What should I modify?
Just for experimentation I tried doing ^(.{5})(.{10})(.{1}) and $1,$2,$3 and this is able to capture properly from 6th char to 15th char. Please help.

1 ACCEPTED SOLUTION

avatar
Super Guru

I think you are confusing the ExtractText and the ReplaceText proessors. The ExtractText doesn't have Search Value & Replacement Value properties but the ReplaceText does. That is why I said post screenshot would be helpful because have I known that its replace Text my answer would have been different.

To get the desired result in this case , you need to specify the following pattern in the Search Value Property:

 

^(.{5})(.{10}).*

 

Basically you need to specify the full lline text that you want to replace with the matched group. When you stopped at "^(.{5})(.{10})" it meant that you only want to replace up to the 15th character of the full text with the result $1,$2 and that is why you were getting the reminder of the text. By adding ".*" at the end it will replace the whole line and not just up to the 15th character. 

The final config will look like this

SAMSAL_0-1725348697318.png

I hope that makes sense.

View solution in original post

4 REPLIES 4

avatar
Super Guru

Hi,

It could have been helpful if you were able to provide some examples regarding the different scenarios with what is expected vs what are you getting. Also providing screenshot of the processor\s in question can help making sure that you have the correct configuration to handle your case. One thing confusing to me is you dont mention anything about white spaces and if they count as a character in case of the name or the address or not.

Going with what you provided, if we assume we have the following line:

 

smithaddress123AAAA

 

where name expected to be: smith (1-5)

address: address123 (6-15)

I have configured the ExtractAddress processor as follows (basically adding new dynamic properties to define the extracted attributes):

SAMSAL_0-1725321039379.png

The output flowfile will have the following attribute which what is expected:

SAMSAL_1-1725321113423.png

The reason on why you are getting additional attributes with an index is because how the processor works in breaking up matching group. You can read more about this here.

If you find this helpful please accept the solution.

Thanks

 

 

avatar
Contributor

Hello @SAMSAL I have used ExtractText processor. This processor has an inbuilt property "Search Value" which I filled as ^(.{5})(.{10}) and in property "replacement value" as $1,$2

My Extract Processor conf is below

AlokKumar_0-1725347551807.png

 

I also want to have whitespaces in my address. like if it is: smithAb Cd 12345678

then i want user name to be smit and address to be Ab Cd 1234.

Also I want to point out that I am basically constructing a comma separated flowfile by using this. The comma "," in $1,$2 makes them comma separated at the end.
The issue is that all works fine but this last $2 is not limiting to only 10 chars but taking chars after the 10 char also. so it ultimately becomes Ab Cd 12345678 instead of Ab Cd 1234

As I was speaking of some experimentation, I observe that if I do  "Search Value" which I filled as ^(.{5})(.{10})(.{1}) and in property "replacement value" as $1,$2,$3  then I observe that both username and address comes proper as expected. now this $3 replaced value contains the extra until last characters.

avatar
Super Guru

I think you are confusing the ExtractText and the ReplaceText proessors. The ExtractText doesn't have Search Value & Replacement Value properties but the ReplaceText does. That is why I said post screenshot would be helpful because have I known that its replace Text my answer would have been different.

To get the desired result in this case , you need to specify the following pattern in the Search Value Property:

 

^(.{5})(.{10}).*

 

Basically you need to specify the full lline text that you want to replace with the matched group. When you stopped at "^(.{5})(.{10})" it meant that you only want to replace up to the 15th character of the full text with the result $1,$2 and that is why you were getting the reminder of the text. By adding ".*" at the end it will replace the whole line and not just up to the 15th character. 

The final config will look like this

SAMSAL_0-1725348697318.png

I hope that makes sense.

avatar
Contributor

Thanks, it was ReplaceText processor and this regex really helped