Support Questions

techNerd · ‎02-05-2021

There are two input Json:

1st Json input:

{
"Name" : "Alex",
"Status" : "Single",
"Life" : [ {
"Sport" : "Swimming",
"Singing" : "K-box",
"Food" : "Burger",
"Alcohol" : "Rum"
}, {
"Sport" : "Boxing",
"Singing" : "party world",
"Food" : "Chicken Wing",
"Alcohol" : "Whisky"
}, {
"Sport" : "Running",
"Singing" : "KTV",
"Food" : "Muffin",
"Alcohol" : "Martel"
}]
}

2nd Json input:

{

"Name" : "Alex",
"Status" : "Single",
"Life" : {
"Sport" : "Swimming",
"Singing" : "K-box",
"Food" : "Burger",
"Alcohol" : "Rum"
}

}

This two Json message input should go to a same JoltTransformJson processor and come out with the following output:

1st Json output:

{
"Name" : "Alex",
"Status" : "Single",
"Sport" : [ "Swimming", "Boxing", "Running"],
"Singing" : [ "K-box", "party world" , "KTV"],
"Food" : [ "Burger", "Chicken Wing" , "Muffin"],
"Alcohol" : [ "Rum", "Whisky", "Martel"]
}

2nd Json output:

{
"Name" : "Alex",
"Status" : "Single",
"Sport" : [ "Swimming"],
"Singing" : [ "K-box"],
"Food" : [ "Burger"],
"Alcohol" : [ "Rum"]
}

How can I configure the JoltTransformJson processor to get the above output? Or is there any other ways to do it? Please advise with step and example. appreciate.

OliverGong · ‎01-14-2022

We can use the chain mode with below JOLT specification:
1. Formalize the two different JSON input type into the same JSON format during phase 1.

During the phase-1 shift:


[{
	"operation": "shift",
	"spec": {
		"*": "&", --Comments: this is use too keep Name and Status elements as their original positions.
		"Life": {
			"Sport": {
				"@1": "Life.[]" --Comments: this is used for the Single Object (Life), which will then be turned into array via this spec settings.
			},
			"0": {
				"@1": {
					"*": "&3.[&]"--Comments: if Life is comming as an array type, the "0" would be matched expectively, then we do the rest shift stuff, to keep the Life Object as array type orignally.
				}
			}
		}
	}
}

NOTE: "--Comments:... " in the above code is only intended for personal comments, please do not treat it as the valid way of JOLT SPEC denotation, as it won't pass the SPEC Validation if those chars are include during validation phase of JOLT SPEC.

2. The second shift during phase-2 of the chain will take over the rest transformation, flatting the array type Life's sub elements and group each one of them as array including single object ( which is not array type after phase-1 shifting, to verify this, you can manually remove the '"[]" of "@": "&1[]" on the phase-2 shift ).

=========================================
The whole content of JOLT Spec is listed as below:
=========================================

[{
	"operation": "shift",
	"spec": {
		"*": "&",
		"Life": {
			"Sport": {
				"@1": "Life.[]"
			},
			"0": {
				"@1": {
					"*": "&3.[&]"
				}
			}
		}
	}
}, {
	"operation": "shift",
	"spec": {
		"Life": {
			"*": {
				"*": {
					"@": "&1[]"
				}
			}
		},
		"*": "&"
	}
}]

Hope this helps.
Thanks

OliverGong · ‎01-14-2022

For other ways to handle this, you can also give a try on below way:
1. Add a Router( RouteOnAttribute, before that, you may need to extract the Life Object from the incoming raw json to flow file attribute), then dispatch the two different types into separate biz flows to deal with the transformation.

2. Another trial could be script coding, you may also write personal script code to implement this.

MattWho · ‎01-19-2022

@OliverGong

I would avoid dataflow design when possible where you are extracting the entire contents of a FlowFile to FlowFile attribute(s). While FlowFile content only exists on disk (unless read in to memory by a processor during processing), FlowFile attributes are held in NiFi's JVM heap memory all the time (There is per connection swapping that happens when a specific connection reaches the swap threshold set in the nifi.properties file). FlowFiles with lots of attributes and/or large attribute values will consume considerable amounts of JVM heap which can lead to JVM Out Of Memory (OOM) exceptions, long stop-the-world JVM Garbage Collection (GC) events, etc... When options exist that avoid adding large attributes, those should be utilized.

Thanks,

Matt

Cloudera Community

Support Questions

How can I used JoltTransformJson Processor to come out with same format Json message output?