Member since
09-16-2017
3
Posts
0
Kudos Received
0
Solutions
09-17-2017
03:08 PM
@Yash thanks for your reply. However my problem is that I do not know the set of fields (either on the root, or inside the array elements) - it is always changing and I don't want to have to update the spec every time someone adds a field. The spec should only know about parent.events and not assume the existence of any other field. I need a way to say "copy everything at the root, except for the parent field." What I've done for the moment is just implemented the logic in Jython - although it is fairly slow.
... View more
09-17-2017
09:02 AM
I am trying to do something very similar, but I do not know what fields are going to exist on the JSON other than the one that contains the array. I'm using this in a scenario where others define & change the schema on a regular basis and the data pipeline needs to pass through the data. Our developers are using a message envelope with some common fields, and then an array of individual messages. So in my case I might have something like: {
"user_id": 123,
"other_root_field": "blah",
"parent": {
"events": [
{
"nested_1": "a",
"nested_2": "b"
},
{
"nested_3": "c",
"nested_1": "d"
}
]
}
} What I want to do is pull out all the individual events, add the data from the envelope and write them to Kafka (still in JSON format). Looking at the above answer it seems like I should use the JoltTransformJSON processor, followed by a SplitJSON process & finally a KafkaProducer. The first event from the example above would look like: {
"user_id": 123,
"other_root_field": "blah",
"exploded_nested_1": "a",
"exploded_nested_2": "b"
} Note that the fields from the array have an "exploded_" prefix added - this is to avoid name collision between any fields defined on the envelope and those in the individual events. To get there it seems like I should produce this from Jolt: [
{
"user_id": 123,
"other_root_field": "blah",
"exploded_nested_1": "a",
"exploded_nested_2": "b"
},
{
"user_id": 123,
"other_root_field": "blah",
"exploded_nested_3": "c",
"exploded_nested_1": "d"
}
] I can't seem to get there from the answer above - although it seems like I should. 1. I can't get Jolt to add the prefix to the fields in the array. [{
"operation": "shift",
"spec": {
"parent": {
"events": {
"*": {
"@": "[exploded_&]"
}
}
}
}
}]
This gives me an error that exploded_& is an invalid index for the array. Using just [&] will output the existing field names though. 2. I can't figure out how to include fields on the root, but exclude the "parent" that holds the array. [{
"operation": "shift",
"spec": {
"parent": {
"events": {
"*": {
"@3": "[&]"
}
}
}
}
}]
Will get me an array entry for every event with all data in each one - I need a way to say all events on the root except "parent". Help would be greatly appreciated. Thanks, --Ben
... View more
09-17-2017
09:02 AM
I am in a scenario where others define & change the schema on a regular basis and the data pipeline needs to pass through the data. Our developers are using a message envelope with some common fields, and then an array of individual messages. So in my case I might have something like: {
"user_id": 123,
"other_root_field": "blah",
"parent": {
"events": [
{
"nested_1": "a",
"nested_2": "b"
},
{
"nested_3": "c",
"nested_1": "d"
}
]
}
}
What I want to do is pull out all the individual events, add the data from the envelope and write them to Kafka (still in JSON format). It seems like I should use the JoltTransformJSON processor, followed by a SplitJSON process & finally a KafkaProducer (please correct me if there is a better way). The first event from the example above would look like: {"user_id":123,"other_root_field":"blah","exploded_nested_1":"a","exploded_nested_2":"b"} Note that the fields from the array have an "exploded_" prefix added - this is to avoid name collision between any fields defined on the envelope and those in the individual events. To get there it seems like I should produce this from Jolt: [
{
"user_id": 123,
"other_root_field": "blah",
"exploded_nested_1": "a",
"exploded_nested_2": "b"
},
{
"user_id": 123,
"other_root_field": "blah",
"exploded_nested_3": "c",
"exploded_nested_1": "d"
}
]
I can't seem to quite get there however: 1. I can't get Jolt to add the prefix to the fields in the array. [{"operation":"shift","spec":{"parent":{"events":{"*":{"@":"[exploded_&]"}}}}}] This gives me an error that exploded_& is an invalid index for the array. Using just [&] will output the existing field names though. 2. I can't figure out how to include fields on the root, but exclude the "parent" that holds the array. [{"operation":"shift","spec":{"parent":{"events":{"*":{"@3":"[&]"}}}}}] Will get me an array entry for every event with all data in each one - I need a way to say all events on the root except "parent". Help would be greatly appreciated. Thanks, --Ben
... View more
Labels:
- Labels:
-
Apache NiFi