Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NIFI - How to split non root node (json array), but include root level attribute in flowfile

avatar
Contributor

In the following JSON, { "p":{ "key":"k1", "theme":"default" }, "version":"1.1.0", "s":[ { "x":1, "y":"0.1" }, { "x":2, "y":"0.2" } ] }

I want to split node "s" array, but I'd like to include root level attribute p.key, p.theme in the split "s" flowfile, see below. How do I do this in Nifi?

k1, default, 1, 0.1

k1, default, 2, 0.2

Thanks

1 ACCEPTED SOLUTION

avatar
Master Guru

Using JoltTransformJSON, you can inject the "key" and "theme" entries from p into the array, and create a top-level array from it. Try the following Shift spec:

{
  "operation": "shift",
  "spec": {
    "s": {
      "*": {
        "@(2,p)": {
          "key": "[#3].key",
          "theme": "[#3].theme"
        },
        "*": "[#2].&"
      }
    }
  }
}

Given your input, it produces the following output:

[ {
  "key" : "k1",
  "theme" : "default",
  "x" : 1,
  "y" : "0.1"
}, {
  "key" : "k1",
  "theme" : "default",
  "x" : 2,
  "y" : "0.2"
} ]

Now you can use a SplitJson (with a JSON Path of $) to get the individual records. If you want to keep them as JSON then you're done; if you want to convert it to CSV, you'd need EvaluateJsonPath like @Timothy Spann mentioned, then ReplaceText with Expression Language to set the fields, something like "${key}, ${theme}, ${x}, ${y}".

An alternative to Jolt, which Tim alluded to, is to use EvaluateJsonPath to get the p.* fields into attributes, then SplitJson, then EvaluateJsonPath to get the s.* attributes, then ReplaceText (either generating JSON or CSV as described). In my opinion I'd use the Jolt transform and keep the content in JSON as long as possible, rather than generating attributes.

View solution in original post

3 REPLIES 3

avatar
Master Guru

have you tried to Jolt or EvaluateJSONPath

try your expression on jsonpath.com

great for figuring those out

also you can process twice with JSON. once with evaluateJsonpath processor and then in the next step in each attribute expression language also have jsonpath

avatar
Master Guru

Using JoltTransformJSON, you can inject the "key" and "theme" entries from p into the array, and create a top-level array from it. Try the following Shift spec:

{
  "operation": "shift",
  "spec": {
    "s": {
      "*": {
        "@(2,p)": {
          "key": "[#3].key",
          "theme": "[#3].theme"
        },
        "*": "[#2].&"
      }
    }
  }
}

Given your input, it produces the following output:

[ {
  "key" : "k1",
  "theme" : "default",
  "x" : 1,
  "y" : "0.1"
}, {
  "key" : "k1",
  "theme" : "default",
  "x" : 2,
  "y" : "0.2"
} ]

Now you can use a SplitJson (with a JSON Path of $) to get the individual records. If you want to keep them as JSON then you're done; if you want to convert it to CSV, you'd need EvaluateJsonPath like @Timothy Spann mentioned, then ReplaceText with Expression Language to set the fields, something like "${key}, ${theme}, ${x}, ${y}".

An alternative to Jolt, which Tim alluded to, is to use EvaluateJsonPath to get the p.* fields into attributes, then SplitJson, then EvaluateJsonPath to get the s.* attributes, then ReplaceText (either generating JSON or CSV as described). In my opinion I'd use the Jolt transform and keep the content in JSON as long as possible, rather than generating attributes.

avatar
New Contributor

I am trying to do something very similar, but I do not know what fields are going to exist on the JSON other than the one that contains the array. I'm using this in a scenario where others define & change the schema on a regular basis and the data pipeline needs to pass through the data. Our developers are using a message envelope with some common fields, and then an array of individual messages. So in my case I might have something like:

{
  "user_id": 123,
  "other_root_field": "blah",
  "parent": {
    "events": [
      {
         "nested_1": "a",
         "nested_2": "b"
      },
      {
        "nested_3": "c",
        "nested_1": "d"
      }
    ]
  }
}

What I want to do is pull out all the individual events, add the data from the envelope and write them to Kafka (still in JSON format). Looking at the above answer it seems like I should use the JoltTransformJSON processor, followed by a SplitJSON process & finally a KafkaProducer.

The first event from the example above would look like:

{
  "user_id": 123,
  "other_root_field": "blah",
  "exploded_nested_1": "a",
  "exploded_nested_2": "b"
}

Note that the fields from the array have an "exploded_" prefix added - this is to avoid name collision between any fields defined on the envelope and those in the individual events.

To get there it seems like I should produce this from Jolt:

[
  {
    "user_id": 123,
    "other_root_field": "blah",
    "exploded_nested_1": "a",
    "exploded_nested_2": "b"
  },
  {
    "user_id": 123,
    "other_root_field": "blah",
    "exploded_nested_3": "c",
    "exploded_nested_1": "d"
  }
]

I can't seem to get there from the answer above - although it seems like I should.

1. I can't get Jolt to add the prefix to the fields in the array.

[{
  "operation": "shift",
  "spec": {
    "parent": {
      "events": {
        "*": {
          "@": "[exploded_&]"
        }
      }
    }
  }
}]

This gives me an error that exploded_& is an invalid index for the array. Using just [&] will output the existing field names though.

2. I can't figure out how to include fields on the root, but exclude the "parent" that holds the array.

[{
  "operation": "shift",
  "spec": {
    "parent": {
      "events": {
        "*": {
          "@3": "[&]"
        }
      }
    }
  }
}]

Will get me an array entry for every event with all data in each one - I need a way to say all events on the root except "parent".

Help would be greatly appreciated.

Thanks,

--Ben