Created 08-21-2024 05:40 AM
HI all
I've been using this community support a LOT lately as I try and get into NiFi and the ETL power it has, and I think it's incredible, but today I find myself instead asking for help, after spending far too long trying to work something out on my own 😕
I'm not too familiar with JOLT and so what I'm trying to do is take a data flow I have which is in the following format:
[
{
"CODE": "1234",
"PRODUCT": "product 1",
"Valid": null,
"Zone 1": "1.00 USD",
"Zone 2": "1.00 USD",
"Zone 3": null,
"Zone 4": null,
"Zone 5": null,
},
{
"CODE": "9876",
"PRODUCT": "product 2",
"Valid": null,
"Zone 1": "18.00 USD",
"Zone 2": "12.00 USD",
"Zone 3": null,
"Zone 4": null,
"Zone 5": null
},
{
"CODE": "3745",
"PRODUCT": "product 3",
"Valid": null,
"Zone 1": "51.00 USD",
"Zone 2": "21.00 USD",
"Zone 3": null,
"Zone 4": null,
"Zone 5": null
}
]
and try to convert it to something like this:
[
{
"code": "1234",
"product": "product 1",
"prices" : [
{
"name" : "Zone 1",
"price": "1.00 USD",
"valid": null
},
{
"name" : "Zone 2",
"price": "1.00 USD",
"valid": null
},
{
"name" : "Zone 3",
"price": null,
"valid": null
},
{
"name" : "Zone 4",
"price": null,
"valid": null
},
{
"name" : "Zone 5",
"price": null,
"valid": null
}
]
},
{
"code": "9876",
"product": "product 2",
"prices" : [
{
"name" : "Zone 1",
"price": "18.00 USD",
"valid": null
},
{
"name" : "Zone 2",
"price": "12.00 USD",
"valid": null
},
{
"name" : "Zone 3",
"price": null,
"valid": null
},
{
"name" : "Zone 4",
"price": null,
"valid": null
},
{
"name" : "Zone 5",
"price": null,
"valid": null
}
]
},
{
"code": "3745",
"product": "product 3",
"prices" : [
{
"name" : "Zone 1",
"price": "51.00 USD",
"valid": null
},
{
"name" : "Zone 2",
"price": "21.00 USD",
"valid": null
},
{
"name" : "Zone 3",
"price": null,
"valid": null
},
{
"name" : "Zone 4",
"price": null,
"valid": null
},
{
"name" : "Zone 5",
"price": null,
"valid": null
}
]
}
]
using my very limited knowledge of JOLT (and incredible search engine skills ha ha) I've managed to get something similar in the playground by trial and error and so I am now at a position like this:
[
{
"code": "1234",
"product": "product 1",
"price_lists": [
{
"Zone 1": "1.00 USD"
},
{
"Zone 2": "1.00 USD"
},
{
"Zone 3": null
},
{
"Zone 4": null
},
{
"Zone 5": null
}
]
},
{
"code": "9876",
"product": "product 2",
"price_lists": [
{
"Zone 1": "18.00 USD"
},
{
"Zone 2": "12.00 USD"
},
{
"Zone 3": null
},
{
"Zone 4": null
},
{
"Zone 5": null
}
]
},
{
"code": "3745",
"product": "product 3",
"price_lists": [
{
"Zone 1": "51.00 USD"
},
{
"Zone 2": "21.00 USD"
},
{
"Zone 3": null
},
{
"Zone 4": null
},
{
"Zone 5": null
}
]
},
null
]
and this is the spec I have used:
[
{
"operation": "shift",
"spec": {
"*": {
"CODE": "[&1].code",
"PRODUCT": "[&1].product",
"Valid": "",
"*": "[&1].price_lists[].&"
}
}
}
]
but sadly I think I have reached the end of my knowledge. I just can't seem to work out how to build the price_lists array in the format that I need it!
I know we can reference hierarchy and also get key values as data using things like "$" and "@" but I simply can't get enough of an example to be able to make sense on how it all works. I may well be going about it the wrong way, but I think this is the right solution, I just don't have the knowledge to be able to achieve what I would like, so I'm reaching out to this amazing community and asking (nay begging) for help ha ha
Thanks in advance all, this is an amazing place !
Created 08-21-2024 12:01 PM
Hi @Crags ,
It looks like you were close....maybe not 🙂 , though heading in the right direction ...will kind of still far a way 😞 . The problem looks easy but is it really ?! Welcome to the world jolt , whenever you think you got it, you face problem that makes you re think your understanding 🙂
I dont mean to discourage you or scare you from jolt . I was in your shoes when I first started where I only knew what {"*":"&"} means . However with practice and a lot of problem solving I got much better and right now I enjoy solving jolt problems . You can read as much tutorial online or lookup some examples but I can assure you the only way to learn it is through practice.
A good place to look for the latest information is the Jolt github repository. For quick cheat sheet I use this link.
Now back to your problem , The challenge is to be able to group extracted attributes from each zone together, for that I use the first shift spec to create each zone attribute and store them under the same zone parent object. Then I use a second shift to bucket each zone object value into the prices array with their product info.
The spec will look like this:
[
{
"operation": "shift",
"spec": {
"*": {
"CODE": "[&1].code",
"PRODUCT": "[&1].product",
"Zone*": {
"$": "[&2].&1.name",
"@": "[&2].&1.price",
"@(1,Valid)": "[&2].&1.valid"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&",
"Zone*": "[&1].prices[]"
}
}
}
]
Since you are new to this , I would also consider looking into another transformation language called jslt and Nifi has processor for that as well. If you are familiar with Xquery language you probably find it easier to learn and sometimes the spec is much simpler like in this scenario where it looks like the following :
[
for (.) let valid=.Valid
{
"Code": .CODE,
"Product":.PRODUCT,
"prices" :[ for (.)
{
"name" :.key,
"value": .value,
"valid" : $valid
}
if(test(.key,"Zone\\s\\d+"))
]
}
]
make sure to remove the expression .!=null from the filter property if you want to have null values in the result.
Hope that helps. If it does, please accept the solution.
Created 08-21-2024 10:07 AM
@Crags Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our NiFi experts @MattWho @mburgess who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 08-21-2024 12:01 PM
Hi @Crags ,
It looks like you were close....maybe not 🙂 , though heading in the right direction ...will kind of still far a way 😞 . The problem looks easy but is it really ?! Welcome to the world jolt , whenever you think you got it, you face problem that makes you re think your understanding 🙂
I dont mean to discourage you or scare you from jolt . I was in your shoes when I first started where I only knew what {"*":"&"} means . However with practice and a lot of problem solving I got much better and right now I enjoy solving jolt problems . You can read as much tutorial online or lookup some examples but I can assure you the only way to learn it is through practice.
A good place to look for the latest information is the Jolt github repository. For quick cheat sheet I use this link.
Now back to your problem , The challenge is to be able to group extracted attributes from each zone together, for that I use the first shift spec to create each zone attribute and store them under the same zone parent object. Then I use a second shift to bucket each zone object value into the prices array with their product info.
The spec will look like this:
[
{
"operation": "shift",
"spec": {
"*": {
"CODE": "[&1].code",
"PRODUCT": "[&1].product",
"Zone*": {
"$": "[&2].&1.name",
"@": "[&2].&1.price",
"@(1,Valid)": "[&2].&1.valid"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"*": "[&1].&",
"Zone*": "[&1].prices[]"
}
}
}
]
Since you are new to this , I would also consider looking into another transformation language called jslt and Nifi has processor for that as well. If you are familiar with Xquery language you probably find it easier to learn and sometimes the spec is much simpler like in this scenario where it looks like the following :
[
for (.) let valid=.Valid
{
"Code": .CODE,
"Product":.PRODUCT,
"prices" :[ for (.)
{
"name" :.key,
"value": .value,
"valid" : $valid
}
if(test(.key,"Zone\\s\\d+"))
]
}
]
make sure to remove the expression .!=null from the filter property if you want to have null values in the result.
Hope that helps. If it does, please accept the solution.
Created 08-22-2024 12:46 AM
Hey @SAMSAL - thanks for the detailed reply, and yep you hit the nail on the head ha ha. I've used a few of those links already, and as you said I was close, but not quite there. I grasp the concept, it's just the syntax of everything that was throwing me off.
That being said, the JSLT example you posted does seem to make more sense to me also, so that's definitely something I'll look into.
I spoke to a guy called Paul Lakus in the NiFi slack channel and he basically gave me almost exactly the same solution that you have posted and so I managed to work out how it was being done, but still a huge learning curve for me heh.
Anyway this now seems to be working as intended, which I'm extremely grateful for so thank you for your input and the steer towards JSLT - definitely something to look into.
Thanks!!