- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to merge many json files together using one common field?
- Labels:
-
Apache NiFi
Created ‎11-23-2021 01:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone!
I have many json files like this:
{
"table_name" : "train_vd",
"data" : [ {
"battery_power" : 1954,
"clock_speed" : 0.5
} ]
}
{
"table_name" : "train_vd",
"data" : [ {
"battery_power" : 842,
"clock_speed" : 2.2
} ]
}
...
I used the MergeContent and MergeRecord processors and used the table_name field as the Correlation Attribute Name (i have ${table_name} attribute). However, this does not work and the result is as follows:
[{
"table_name" : "train_vd",
"data" : [ {
"battery_power" : 509,
"clock_speed" : 0.6
} ]
}{
"table_name" : "train_vd",
"data" : [ {
"battery_power" : 842,
"clock_speed" : 2.2
} ]
}]
...
However, I want to get the following result:
[{
"table_name" : "train_vd",
"data" : [ {
"battery_power" : 509,
"clock_speed" : 0.6
},
{
"battery_power" : 842,
"clock_speed" : 2.2
}]
}]
May you tell me how to solve this problem? Need i use a complex Jolt transformation or to configure the incoming Avro schema in the MergeRecord processor, so that then everything is combined using a single field?
Created ‎11-23-2021 07:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer! All my json FlowFiles have a FlowFile attribute on them for "table_name". There may be a problem with the json schema itself. Now the task has changed. I have created a new question about Jolt.
https://community.cloudera.com/t5/Support-Questions/Jolt-transform/td-p/330850
If you know the answer to it, I would be very grateful!
Created ‎11-23-2021 05:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Protector
Do all your json FlowFiles have a FlowFile attribute on them for "table_name". It is not pulling table_name from the FlowFIle content (your json content) itself.
The Correlation Attribute Name property in the MergeContent processors is looking for this FlowFile Attribute on each incoming FlowFile in order to allocate those FlowFiles with same value assign to that FlowFile attribute to the same bin. Then a bin is merged when it meets the other configured mins on the MergeContent, max bin age is reached, or all bins have files allocated to them and another bin is needed forcing the merge of the oldest bin.
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
Created ‎11-23-2021 07:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answer! All my json FlowFiles have a FlowFile attribute on them for "table_name". There may be a problem with the json schema itself. Now the task has changed. I have created a new question about Jolt.
https://community.cloudera.com/t5/Support-Questions/Jolt-transform/td-p/330850
If you know the answer to it, I would be very grateful!
