Created 05-10-2017 08:58 PM
I have two scenario's where I need help to extract the values
1) Flow file have the content as below
*********content of flow file***********
field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue
********End of Content***************
I want to extract values of field1, field2, field3 ... field_n and store them to 3 attributes. Can I get regular expression example to do that using ExtractText in Nifi.
or say I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?
2) If I have an attribute with value as below
**********attrbitue vlaue***********
field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue
***********************************
I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?
Created on 05-11-2017 01:25 PM - edited 08-17-2019 06:18 PM
You could use extractText processor configured similar to the following:
I changed the two shown standard properties and added two additional regex properties.
Using the following example input:
field_1=field1value&field_2=field2value&field_3=field3value&field_4=field4value
You will end up with the following attributes on your FlowFile:
There will be a few additional attributes created that you can ignore, but you will have a sequentially numbered attribute names with the associated values and one field_last that will have the very last value in your input string.
Thanks,
Matt
Created 05-11-2017 07:06 PM
the structure or the fields order is not consistent.
I would like to put it this way, there are lot number of fields separated by '&' in the flow file and I just want to extract specific fields from the flow file content like
&testfield=testfieldvalue&testafield=testafieldvalue&... m number of fields...&firstname=firstnamevalue&... n number of fields ... &lastname=lastnamevalue& .... x number of fields ... &email=testemail@email.com& .... y number of fields .... &address=test address&...z number of fields .... &
Hope this gives better picture!
Created 05-11-2017 07:33 PM
I got the answer. I can do something as below to extract the specific field value. For example for fisrtname, I do as below in ExtractText processor
.*&firstname=(.*?)&.*
but I see the copies of the attributes like firstname, firstname.0, firstname.1 etc. How can I mitigate the creation of rest of attributes and generate only "firstname".
Created on 05-11-2017 07:54 PM - edited 08-17-2019 06:17 PM
You can get rid of capture group 0 by changing the "Include Capture Group 0" property to false.
Capture groups are defined using (). what falls between them is a capture group. You can create a single java expression that has multiple capture groups. So the ExtractText processor creates a attribute for each of these capture groups. Then the complete result of the regular expression is assigned to the property name you created.
In the above i have created four different java regular expressions. Each runs against the content of the incoming FlowFile. Each Regex has one capture group (.*?). As result you will have address.1= to that one defined capture group. NiFi also wants to create the attribute as you defined with just "address". so the <propertyname>.num attributes are created because of the use of capture groups in your regex.
For example, let assume the firstname property had the following regex:
firstname=(.*?)[&]{1}lastname=(.*?)[&]{1}
Here you see i have two capture groups in one regular expression.
This would translate in to two new attributes on your flowfile:
Thanks,
Matt
Created on 05-11-2017 08:28 PM - edited 08-17-2019 06:17 PM
Since you don't know where in the content these four fields will fall, I would suggest using the following:
The other regex rules I provided expect address as the last field.
these will work no matter where in the content these patterns are found.
The .*&firstname=(.*?)&.* will not work in all cases since it expect you to have 1 or more characters followed by "&filename=" followed by a many characters up to the last found "&" followed by 1 or more trailing characters. This expression will not work should firstname be the first or last field in the content.
Thank,
Matt If you found my answers helpful to your question, please mark it as "accepted"
Created 05-11-2017 08:27 PM
thanks for the latest response. Its informative and gives more understanding about regular expressions!
Created 06-06-2017 08:36 PM
Hi @Matt Clarke,
Does this one address=([a-zA-Z0-9 ,]+) mean NiFi can find the column named "address"?
I tried it, but no luck. Would you please upload an example?
Thanks.
Created 06-06-2017 09:02 PM
My suggestion would be use something like:
You can enter your regex and sample test you want to run it against.
Matt