Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Parsing flow file or attribute in Nifi

avatar
Rising Star

I have two scenario's where I need help to extract the values

1) Flow file have the content as below

*********content of flow file***********

field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue

********End of Content***************

I want to extract values of field1, field2, field3 ... field_n and store them to 3 attributes. Can I get regular expression example to do that using ExtractText in Nifi.

or say I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?

2) If I have an attribute with value as below

**********attrbitue vlaue***********

field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue

***********************************

I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Anil Reddy

You could use extractText processor configured similar to the following:

15323-screen-shot-2017-05-11-at-92413-am.png

I changed the two shown standard properties and added two additional regex properties.

Using the following example input:

field_1=field1value&field_2=field2value&field_3=field3value&field_4=field4value

You will end up with the following attributes on your FlowFile:

15321-screen-shot-2017-05-11-at-92000-am.png

There will be a few additional attributes created that you can ignore, but you will have a sequentially numbered attribute names with the associated values and one field_last that will have the very last value in your input string.

Thanks,

Matt

View solution in original post

16 REPLIES 16

avatar
Rising Star

@Matt Clarke

the structure or the fields order is not consistent.

I would like to put it this way, there are lot number of fields separated by '&' in the flow file and I just want to extract specific fields from the flow file content like

&testfield=testfieldvalue&testafield=testafieldvalue&... m number of fields...&firstname=firstnamevalue&... n number of fields ... &lastname=lastnamevalue& .... x number of fields ... &email=testemail@email.com& .... y number of fields .... &address=test address&...z number of fields .... &

Hope this gives better picture!

avatar
Rising Star

@Matt Clarke

I got the answer. I can do something as below to extract the specific field value. For example for fisrtname, I do as below in ExtractText processor

.*&firstname=(.*?)&.*

but I see the copies of the attributes like firstname, firstname.0, firstname.1 etc. How can I mitigate the creation of rest of attributes and generate only "firstname".

avatar
Super Mentor

@Anil Reddy

You can get rid of capture group 0 by changing the "Include Capture Group 0" property to false.

Capture groups are defined using (). what falls between them is a capture group. You can create a single java expression that has multiple capture groups. So the ExtractText processor creates a attribute for each of these capture groups. Then the complete result of the regular expression is assigned to the property name you created.

15336-screen-shot-2017-05-11-at-33603-pm.png

In the above i have created four different java regular expressions. Each runs against the content of the incoming FlowFile. Each Regex has one capture group (.*?). As result you will have address.1= to that one defined capture group. NiFi also wants to create the attribute as you defined with just "address". so the <propertyname>.num attributes are created because of the use of capture groups in your regex.

For example, let assume the firstname property had the following regex:

firstname=(.*?)[&]{1}lastname=(.*?)[&]{1}

Here you see i have two capture groups in one regular expression.

This would translate in to two new attributes on your flowfile:

15337-screen-shot-2017-05-11-at-35325-pm.png

Thanks,

Matt

avatar
Super Mentor

@Anil Reddy

Since you don't know where in the content these four fields will fall, I would suggest using the following:

15339-screen-shot-2017-05-11-at-42059-pm.png

The other regex rules I provided expect address as the last field.

these will work no matter where in the content these patterns are found.

The .*&firstname=(.*?)&.* will not work in all cases since it expect you to have 1 or more characters followed by "&filename=" followed by a many characters up to the last found "&" followed by 1 or more trailing characters. This expression will not work should firstname be the first or last field in the content.

Thank,

Matt If you found my answers helpful to your question, please mark it as "accepted"

avatar
Rising Star

Matt Clarke

thanks for the latest response. Its informative and gives more understanding about regular expressions!

avatar
Expert Contributor

Hi @Matt Clarke,

Does this one address=([a-zA-Z0-9 ,]+) mean NiFi can find the column named "address"?

I tried it, but no luck. Would you please upload an example?

Thanks.

avatar
Super Mentor

@Alvin Jin

My suggestion would be use something like:

https://regex101.com/

You can enter your regex and sample test you want to run it against.

Matt