Support Questions

Find answers, ask questions, and share your expertise

Parsing flow file or attribute in Nifi

avatar
Rising Star

I have two scenario's where I need help to extract the values

1) Flow file have the content as below

*********content of flow file***********

field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue

********End of Content***************

I want to extract values of field1, field2, field3 ... field_n and store them to 3 attributes. Can I get regular expression example to do that using ExtractText in Nifi.

or say I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?

2) If I have an attribute with value as below

**********attrbitue vlaue***********

field_1=field1value&feild_2=field2value&field_3=field3value&...&fieild_n=fieldnvalue

***********************************

I want to extract value of field_x (1 < x < n) attribute from above list, how can I do that ?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Anil Reddy

You could use extractText processor configured similar to the following:

15323-screen-shot-2017-05-11-at-92413-am.png

I changed the two shown standard properties and added two additional regex properties.

Using the following example input:

field_1=field1value&field_2=field2value&field_3=field3value&field_4=field4value

You will end up with the following attributes on your FlowFile:

15321-screen-shot-2017-05-11-at-92000-am.png

There will be a few additional attributes created that you can ignore, but you will have a sequentially numbered attribute names with the associated values and one field_last that will have the very last value in your input string.

Thanks,

Matt

View solution in original post

16 REPLIES 16

avatar
Guru

Hi @Anil Reddy

If you are like me, and dislike RegEx, one trick you can try is to use the SplitContent processor first. Change config dropdown to use Text instead of Hexadecimal, and use the byte sequence of your pair delimiter &. This would simplify the RegEx if you wanted to use ExtractText still. Or perhaps you can explore using another SplitContent processor on the = to get the field and value tokens separately. Hopefully you can avoid the RegEx there.

As always, if you find this post helpful, please accept the answer.

avatar
Rising Star

@Sonu Sahi

thanks for the response. The approach mentioned above does not give the ideal solution for my requirement.

avatar
Master Mentor

@Anil Reddy

You could use extractText processor configured similar to the following:

15323-screen-shot-2017-05-11-at-92413-am.png

I changed the two shown standard properties and added two additional regex properties.

Using the following example input:

field_1=field1value&field_2=field2value&field_3=field3value&field_4=field4value

You will end up with the following attributes on your FlowFile:

15321-screen-shot-2017-05-11-at-92000-am.png

There will be a few additional attributes created that you can ignore, but you will have a sequentially numbered attribute names with the associated values and one field_last that will have the very last value in your input string.

Thanks,

Matt

avatar
Rising Star

@Matt Clarke: thanks for the solution and it works specifically to the example of field_1, field_2.

Actually my intention is, I have couple of fields as part of flow file which are separated by character &. My requirement is I would like to extract the specific field based on field name. I have been trying to achieve that experimenting regular expressions but could not able to succeed so far. Infact I ma trying to understand the regular expressions you specified but could not get hold of it.

can you please let me know how to extract the fields in the flow file whose content is say

firstname=testfirstname&lastname=testlastname,email=testemail@email.com,address=test address.

I want to extract attributes firstname, lastname, email, address with appropriate values.

avatar
Master Mentor

@Anil Reddy

Your new example does not use a "&" between all your fields. Is that a typo? I see "&" between first two fields and "," after that.

avatar
Rising Star

sorry, its a typo mistake.

firstname=testfirstname&lastname=testlastname&email=testemail@email.com&address=test address.

avatar
Master Mentor
@Anil Reddy

Are the four fields you are trying to extract the values from consistently:

"firstname"

"lastname"

"email"

"address"

avatar
Rising Star

yes, i want to extract values of firstname, lastname, email, address

we can assume like below.

If the flow files content is as

&firstname=testfirstname&lastname=testlastname&email=testemail@email.com&address=test address&

I want to extract XXXX in regex &firstname=XXXXXXXx&.* as value for firstname field

extract XXXXXx in regex .*&lastname=XXXXX&.* as value for lastname etc.

avatar
Rising Star

Just for tagging!

@Matt Clarke