Support Questions

Find answers, ask questions, and share your expertise

Apache NiFi - JOLT Expression: Flatten Hierarchical Data & Distribute Value from JSON into Array of Objects

avatar
New Contributor

Hi Folks,

This is likely a multipart problem, but I've been trying to get the following sample JSON to distribute the "id" value from the root of the JSON object into a flattened array of objects with the "id" value and the "code" value from each category (or subcategory in the nested structure).  For example, given the following JSON payload:

 

{
  "id": "1234",
  "categories": [
    {
      "code": "A",
      "name": "Alpha",
      "subcategories": [
        {
          "code": "AA",
          "name": "Alpha Alpha"
        },
        {
          "code": "AB",
          "name": "Alpha Bravo"
        }
      ]
    },
    {
      "code": "B",
      "name": "Bravo"
    },
    {
      "code": "C",
      "name": "Charlie"
    },
    {
      "code": "D",
      "name": "Delta"
    }
  ]
}

 

The Desired Output would be: 

 

[
  {
    "individualId": "1234",
    "categoryCode": "A",
    "categoryName": "Alpha"
  },
  {
    "individualId": "1234",
    "categoryCode": "AA",
    "categoryName": "Alpha Alpha"
  },
  {
    "individualId": "1234",
    "categoryCode": "AB",
    "categoryName": "Alpha Bravo"
  },
  {
    "individualId": "1234",
    "categoryCode": "B",
    "categoryName": "Bravo"
  },
  {
    "individualId": "1234",
    "categoryCode": "C",
    "categoryName": "Charlie"
  },
  {
    "individualId": "1234",
    "categoryCode": "D",
    "categoryName": "Delta"
  }
]

 

I tried the following JOLT Transformation Expression: 

 

[
  {
    "operation": "shift",
    "spec": {
      "id": "id",
      "categories": {
        "*": {
          "code": "categories[&1].categoryCode",
          "name": "categories[&1].categoryName",
          "@(2,id)": "categories[&1].individualId",
          "subcategories": {
            "*": {
              "code": "categories[&3].categoryCode",
              "name": "categories[&3].categoryName",
              "@(4,id)": "categories[&3].individualId"
            }
          }
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "categories": {
        "*": "[]"
      }
    }
  }
]

 

But got this, not sure why the subcategories keep getting grouped into the first element,. Any insight or help would be much appreciated: 

 

[ {
  "individualId" : [ "1234", "1234", "1234" ],
  "categoryCode" : [ "A", "AA", "AB" ],
  "categoryName" : [ "Alpha", "Alpha Alpha", "Alpha Bravo" ]
}, {
  "individualId" : "1234",
  "categoryCode" : "B",
  "categoryName" : "Bravo"
}, {
  "individualId" : "1234",
  "categoryCode" : "C",
  "categoryName" : "Charlie"
}, {
  "individualId" : "1234",
  "categoryCode" : "D",
  "categoryName" : "Delta"
} ]

 



Phill Nacelli
Chief Technology Officer, Penacity, LLC
1 ACCEPTED SOLUTION

avatar
Expert Contributor

@MDTechie - Here is a provided solution for the question you asked. I usually like to Jolt Transforms step by step. Please see the Jolt Spec below along with explanations for each step.

 

[
  {
    "operation": "shift",
    "spec": {
      "id": "id",
      "categories": {
        "*": {
          "@": "categories[]",
          "subcategories": {
            "*": {
              "@": "categories[]"
            }
          }
        }
      }
    }
  },
  {
    "operation": "remove",
    "spec": {
      "categories": {
        "*": {
          "subcategories": ""
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "categories": {
        "*": {
          "code": "categories[&1].categoryCode",
          "name": "categories[&1].categoryName",
          "@(2,id)": "categories[&1].individualId"
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "categories": {
        "*": ""
      }
    }
  }
]

 

The first operation

  1. Copies the id field from the original input to the top level of the output.
  2. Iterates over the categories array and:
    • Copies each category object to a new array called categories[].
    • If a category contains a subcategories array, each subcategory object is also copied into the same categories[] array, effectively flattening the nested structure.

The second operation

  1. Searches within each category object in the categories[] array.
  2. Removes the subcategories field if it exists.

The third operation

 

  1. Iterates over the categories[] array, processing each category object.
  2. Renames the code field to categoryCode and the name field to categoryName for consistency with the new output schema.
  3. Adds a new field called individualId inside each category object, using the id value from two levels up (the root of the original input).

The fourth operation:

 

  1. Takes each object from the categories[] array and moves it to the root level.
  2. As a result, the categories array wrapper is removed, leaving a flat array of individual objects.

I suggest piecing it out in a JOLT tester to understand it a little better. Hope this helps! 🙂 

If you found the solution helpful, please "Accept as Solution"

 

 



View solution in original post

4 REPLIES 4

avatar
Community Manager

@MDTechie Welcome to the Cloudera Community!

To help you get the best possible solution, I have tagged our NiFi experts @MattWho @SAMSAL  who may be able to assist you further.

Please keep us updated on your post, and we hope you find a satisfactory solution to your query.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
New Contributor

Hi @DianaTorres , thank you for the warm welcome.  Appreciate you tagging the cavalry!

Best,

Phill Nacelli
Chief Technology Officer, Penacity, LLC

avatar
Expert Contributor

@MDTechie - Here is a provided solution for the question you asked. I usually like to Jolt Transforms step by step. Please see the Jolt Spec below along with explanations for each step.

 

[
  {
    "operation": "shift",
    "spec": {
      "id": "id",
      "categories": {
        "*": {
          "@": "categories[]",
          "subcategories": {
            "*": {
              "@": "categories[]"
            }
          }
        }
      }
    }
  },
  {
    "operation": "remove",
    "spec": {
      "categories": {
        "*": {
          "subcategories": ""
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "categories": {
        "*": {
          "code": "categories[&1].categoryCode",
          "name": "categories[&1].categoryName",
          "@(2,id)": "categories[&1].individualId"
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "categories": {
        "*": ""
      }
    }
  }
]

 

The first operation

  1. Copies the id field from the original input to the top level of the output.
  2. Iterates over the categories array and:
    • Copies each category object to a new array called categories[].
    • If a category contains a subcategories array, each subcategory object is also copied into the same categories[] array, effectively flattening the nested structure.

The second operation

  1. Searches within each category object in the categories[] array.
  2. Removes the subcategories field if it exists.

The third operation

 

  1. Iterates over the categories[] array, processing each category object.
  2. Renames the code field to categoryCode and the name field to categoryName for consistency with the new output schema.
  3. Adds a new field called individualId inside each category object, using the id value from two levels up (the root of the original input).

The fourth operation:

 

  1. Takes each object from the categories[] array and moves it to the root level.
  2. As a result, the categories array wrapper is removed, leaving a flat array of individual objects.

I suggest piecing it out in a JOLT tester to understand it a little better. Hope this helps! 🙂 

If you found the solution helpful, please "Accept as Solution"

 

 



avatar
New Contributor

Hi @drewski7 , 

Now I have another minor problem.. how do I send you a six pack to thank you for the help??? 😉

Been banging my head on this for days, still going through the logic to learn from it.  Really appreciate the detailed explanation. 👍 🙂

Cheers! :clinking_beer_mugs:

Phill Nacelli
Chief Technology Officer, Penacity, LLC