- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Apache NiFi - JOLT Expression: Flatten Hierarchical Data & Distribute Value from JSON into Array of Objects
- Labels:
-
Apache NiFi
Created 10-22-2024 02:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Folks,
This is likely a multipart problem, but I've been trying to get the following sample JSON to distribute the "id" value from the root of the JSON object into a flattened array of objects with the "id" value and the "code" value from each category (or subcategory in the nested structure). For example, given the following JSON payload:
{
"id": "1234",
"categories": [
{
"code": "A",
"name": "Alpha",
"subcategories": [
{
"code": "AA",
"name": "Alpha Alpha"
},
{
"code": "AB",
"name": "Alpha Bravo"
}
]
},
{
"code": "B",
"name": "Bravo"
},
{
"code": "C",
"name": "Charlie"
},
{
"code": "D",
"name": "Delta"
}
]
}
The Desired Output would be:
[
{
"individualId": "1234",
"categoryCode": "A",
"categoryName": "Alpha"
},
{
"individualId": "1234",
"categoryCode": "AA",
"categoryName": "Alpha Alpha"
},
{
"individualId": "1234",
"categoryCode": "AB",
"categoryName": "Alpha Bravo"
},
{
"individualId": "1234",
"categoryCode": "B",
"categoryName": "Bravo"
},
{
"individualId": "1234",
"categoryCode": "C",
"categoryName": "Charlie"
},
{
"individualId": "1234",
"categoryCode": "D",
"categoryName": "Delta"
}
]
I tried the following JOLT Transformation Expression:
[
{
"operation": "shift",
"spec": {
"id": "id",
"categories": {
"*": {
"code": "categories[&1].categoryCode",
"name": "categories[&1].categoryName",
"@(2,id)": "categories[&1].individualId",
"subcategories": {
"*": {
"code": "categories[&3].categoryCode",
"name": "categories[&3].categoryName",
"@(4,id)": "categories[&3].individualId"
}
}
}
}
}
},
{
"operation": "shift",
"spec": {
"categories": {
"*": "[]"
}
}
}
]
But got this, not sure why the subcategories keep getting grouped into the first element,. Any insight or help would be much appreciated:
[ {
"individualId" : [ "1234", "1234", "1234" ],
"categoryCode" : [ "A", "AA", "AB" ],
"categoryName" : [ "Alpha", "Alpha Alpha", "Alpha Bravo" ]
}, {
"individualId" : "1234",
"categoryCode" : "B",
"categoryName" : "Bravo"
}, {
"individualId" : "1234",
"categoryCode" : "C",
"categoryName" : "Charlie"
}, {
"individualId" : "1234",
"categoryCode" : "D",
"categoryName" : "Delta"
} ]
Chief Technology Officer, Penacity, LLC
Created 10-23-2024 11:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@MDTechie - Here is a provided solution for the question you asked. I usually like to Jolt Transforms step by step. Please see the Jolt Spec below along with explanations for each step.
[
{
"operation": "shift",
"spec": {
"id": "id",
"categories": {
"*": {
"@": "categories[]",
"subcategories": {
"*": {
"@": "categories[]"
}
}
}
}
}
},
{
"operation": "remove",
"spec": {
"categories": {
"*": {
"subcategories": ""
}
}
}
},
{
"operation": "shift",
"spec": {
"categories": {
"*": {
"code": "categories[&1].categoryCode",
"name": "categories[&1].categoryName",
"@(2,id)": "categories[&1].individualId"
}
}
}
},
{
"operation": "shift",
"spec": {
"categories": {
"*": ""
}
}
}
]
The first operation
- Copies the id field from the original input to the top level of the output.
- Iterates over the categories array and:
- Copies each category object to a new array called categories[].
- If a category contains a subcategories array, each subcategory object is also copied into the same categories[] array, effectively flattening the nested structure.
The second operation
- Searches within each category object in the categories[] array.
- Removes the subcategories field if it exists.
The third operation
- Iterates over the categories[] array, processing each category object.
- Renames the code field to categoryCode and the name field to categoryName for consistency with the new output schema.
- Adds a new field called individualId inside each category object, using the id value from two levels up (the root of the original input).
The fourth operation:
- Takes each object from the categories[] array and moves it to the root level.
- As a result, the categories array wrapper is removed, leaving a flat array of individual objects.
I suggest piecing it out in a JOLT tester to understand it a little better. Hope this helps! 🙂
If you found the solution helpful, please "Accept as Solution"
Created 10-22-2024 04:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@MDTechie Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @MattWho @SAMSAL who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Community Moderator
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 10-23-2024 06:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DianaTorres , thank you for the warm welcome. Appreciate you tagging the cavalry!
Best,
Chief Technology Officer, Penacity, LLC
Created 10-23-2024 11:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@MDTechie - Here is a provided solution for the question you asked. I usually like to Jolt Transforms step by step. Please see the Jolt Spec below along with explanations for each step.
[
{
"operation": "shift",
"spec": {
"id": "id",
"categories": {
"*": {
"@": "categories[]",
"subcategories": {
"*": {
"@": "categories[]"
}
}
}
}
}
},
{
"operation": "remove",
"spec": {
"categories": {
"*": {
"subcategories": ""
}
}
}
},
{
"operation": "shift",
"spec": {
"categories": {
"*": {
"code": "categories[&1].categoryCode",
"name": "categories[&1].categoryName",
"@(2,id)": "categories[&1].individualId"
}
}
}
},
{
"operation": "shift",
"spec": {
"categories": {
"*": ""
}
}
}
]
The first operation
- Copies the id field from the original input to the top level of the output.
- Iterates over the categories array and:
- Copies each category object to a new array called categories[].
- If a category contains a subcategories array, each subcategory object is also copied into the same categories[] array, effectively flattening the nested structure.
The second operation
- Searches within each category object in the categories[] array.
- Removes the subcategories field if it exists.
The third operation
- Iterates over the categories[] array, processing each category object.
- Renames the code field to categoryCode and the name field to categoryName for consistency with the new output schema.
- Adds a new field called individualId inside each category object, using the id value from two levels up (the root of the original input).
The fourth operation:
- Takes each object from the categories[] array and moves it to the root level.
- As a result, the categories array wrapper is removed, leaving a flat array of individual objects.
I suggest piecing it out in a JOLT tester to understand it a little better. Hope this helps! 🙂
If you found the solution helpful, please "Accept as Solution"
Created on 10-23-2024 12:49 PM - edited 10-23-2024 12:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @drewski7 ,
Now I have another minor problem.. how do I send you a six pack to thank you for the help??? 😉
Been banging my head on this for days, still going through the logic to learn from it. Really appreciate the detailed explanation. 👍 🙂
Cheers! :clinking_beer_mugs:
Chief Technology Officer, Penacity, LLC
