Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[ NIFI ] Convert the flowfile content to an attribute

avatar
Explorer

Hello everyone

i have a content in Json (an array of id)

[{"id":7532233},{"id":755163}]

 

and i need to convert this whole array into one attribute so that i can after that refer to it in an ExecuteSql processor with statement :

update column 1  where id exists(${attributeName})

1 ACCEPTED SOLUTION

avatar

Hi,

If I understood your question correctly, you want to place the file content into an attribute and store it in sql? If that is the case you can use ExtractText Processor. there you add dynamic property lets call it "IdArray" and set the regular expression to "(?s)(^.*$)" to capture everything in the file. Just be careful there is a size limit set of 1 MB to be captured if you think your data will be more then you can increase it. Hope that helps, if so please accept solution.

Thanks

 

View solution in original post

5 REPLIES 5

avatar

Hi,

If I understood your question correctly, you want to place the file content into an attribute and store it in sql? If that is the case you can use ExtractText Processor. there you add dynamic property lets call it "IdArray" and set the regular expression to "(?s)(^.*$)" to capture everything in the file. Just be careful there is a size limit set of 1 MB to be captured if you think your data will be more then you can increase it. Hope that helps, if so please accept solution.

Thanks

 

avatar
Explorer

Thanks alot SAMSAL. It actually worked

but I've a two points if u can help me
1. since am not regex professional, is there a regex that could produce

7532233,755163
the id inside the array separated by comma 
since i will be using a sql statement like this 

SELECT  *
FROM    table
WHERE   id = ANY(ARRAY[${idArray}])

 2. even i changed the Maximum Buffer Size to 5 MB and Maximum Capture Group Length to 5000
i can see that the attribute is being duplicated with name 

idArray.1
Is that something i have to worry about ? while calling the attribute ?

avatar

Hi,

regarding the first point I dont think you can do that through regex, this is more of transformation operation and you might need to use JsonJolt Processor to do transromation first to simplify the json and store all id's in an array then use regular expression to extract the ids. It will complicate matter it might affect performance. What kind of SQL are you using ? Do you have any json functions in that sql? If that is the case I would recommend you deffer this process to SQL and utilize sql function to parse json.  For example in MS SQL you can use OPENJSON function to transpose json into table and then you can use this table in your query.

For the second point, this is because the way the extractText reg expression work it treats like group, so beside the main attribute you will get an indexed attribute for each matched group. You can read more about this here:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache...

 

You can also use this post to see if you can extract one attirbute. I just found about it:

https://community.cloudera.com/t5/Support-Questions/Extract-whole-json-flowfilecontent-small-to-sing...

 

avatar
Super Mentor

@IslamGamal 

Keep in mind that all the FlowFile attributes for a FlowFile are held in NiFi's JVM heap memory.  Creating large attributes on your FlowFiles can quickly eat up a lot of heap memory and affect JVM performance.

Thanks,
Matt

avatar
Explorer

Hi Matt,
Thanks for alerting me, but since I need to use the extracted id array inside a sql statement and I need to refer to the id array (the only way to do that is to convert the content it to an attribute as far as I know)
Please if u have an article about how to handle the heap memory let me know
because i need to know the cycle and when it's been clean is it after the flowfile finish or something else
in other scenarios the flowfile will contains a lot of attributes and there is millions of data coming in
so I appreciate your info