Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Merge json events based on property

avatar
Expert Contributor

The current workflow is exporting each event.

We are looking to merge all json events based on service/eventname and concatenate time and export them to s3. Our requirement on and merge them using expression language at the runtime.

1 ACCEPTED SOLUTION

avatar
Master Guru

The MergeContent processor can be used to merge JSON together and has a property called "Correlation Attribute Name" which when specified will merge together flow files that have the same value for the attribute specified.

In your scenario you first need to use EvaluateJSONPath to extract "service" and "eventName" from the JSON document. Based on your sample JSON it seems like they are at the root level of the document so I believe something like:

service = $.service
eventName = $.eventName

Then you need to get these two values into a single attribute, so you can use UpdateAttribute with something like:

serviceEventName = ${service}/${eventName}

Then in MergeContent set the "Correlation Attribute Name" to "serviceEventName". You can also specify the minimum group size and age so that you can merge together either 100MB or 1 hour worth of data.

View solution in original post

5 REPLIES 5

avatar
Master Guru

The MergeContent processor can be used to merge JSON together and has a property called "Correlation Attribute Name" which when specified will merge together flow files that have the same value for the attribute specified.

In your scenario you first need to use EvaluateJSONPath to extract "service" and "eventName" from the JSON document. Based on your sample JSON it seems like they are at the root level of the document so I believe something like:

service = $.service
eventName = $.eventName

Then you need to get these two values into a single attribute, so you can use UpdateAttribute with something like:

serviceEventName = ${service}/${eventName}

Then in MergeContent set the "Correlation Attribute Name" to "serviceEventName". You can also specify the minimum group size and age so that you can merge together either 100MB or 1 hour worth of data.

avatar
Expert Contributor

@Bryan Bende Thanks for the answer it did work for me. Just a small config iam looking for. Currently when i merge my json events and export them to S3 iam getting concatenated json events delimited by "Space" in a single line. At the moment iam getting concatenated json events in a single line. How can i get the json events delimited by new line \n. Thank you.

avatar
Master Guru

In MergeContent there is a Delimiter Strategy, choose "Text" which means it uses the values type in to Header, Demarcator, and Footer. The Demarcator is what gets put between each FlowFile that is merged together. You can enter a new line with shift+enter.

avatar
Expert Contributor
@BigDataRocks - I believe that Bryan's answer above is very accurate, so this is not really intended to directly answer your question but wanted to mention that your directory structure above can be simplified to just:

eventsink/${service_type}/${event_name}/${now():format('yyyy/MM/dd/HHmmssSSS')}.${filename}.json

As you have it above, you are asking for "now()" multiple times would could cause some weirdness if the hour rolls over between invocations, etc. Doing it all with a single call to now() will address this and simplifies the configuration as well.

avatar
Expert Contributor

@mpayne thanks for pointing it out 🙂