Created on 11-18-2023 10:54 PM - edited 11-18-2023 10:59 PM
After using JOLT for many years now I still find myself fumbling my way into solutions, I am however stuck on the problem of how to selectively remove values from a JSON array.
The scenario is that I am trying to solve for is that I want to wholly remove any values in the array whereby there is an equals "=" or really any "special" character.
Here is the example input:
{
"tags": [
"misp-pattern=\\\"Phishing - T1566\\\"",
"circl:incident-classification=\\\"phishing\\\"",
"IDS"
]
}
and the output should be;
{
"tags": [
"IDS"
]
}
For those interested I am trying to transpose tags from MISP (Threat Intel platform) and under certain conditions it inserts some horrible entries I want to wholly remove them.
I understand I can do this in a few steps by extracting and "recasting" this back into JSON however I need to do it via JOLT
Created on 11-19-2023 08:14 AM - edited 11-19-2023 08:16 AM
Hi @simonsig ,
There is no straight forward generic way to do this using just jolt only. What you are looking for involves some regex manipulation that I dont think Jolt spec support. Maybe at some point it will be supported through the "modify-overwrite-beta" spec by adding regexReplace function to the string functions.
Jolt however can support simple pattern matching. For example, if you use RHS "*=*" when traversing the tags array value in the spec , it will give you values that contain "=" character. You can use this to accommodate for all possible special characters then direct values to InvalidTag object , whatever is left "*" can be directed to valid tags array. Then you can use another spec to remove the InvalidTags. The drawback of this is that you have to know all possible special characters and the list them as in the following spec:
[
{
"operation": "shift",
"spec": {
"tags": {
"*": {
// find all values with special character listed
// below and move to InvalidTags
"*\\\"*": {
"$": "InvalidTags[]"
},
"*-*": {
"$": "InvalidTags[]"
},
"*=*": {
"$": "InvalidTags[]"
},
"*:*": {
"$": "InvalidTags[]"
},
//The values that wont have any of the special
//characters above will be moved to tags
"*": {
"$": "tags[]"
}
}
}
}
},
{
"operation": "remove",
"spec": {
//Remove InvalidTags
"InvalidTags": ""
}
}
]
If you cant account for all possible special characters , then you cant just rely on Jolt . The simplest way I can think of is to use an UpdateRecord processor before Jolt where you can use Expression Language that support regex replace functions to replace all special characters using the regex pattern "\W+" with a common character like "?" then you can use the Jolt spec above but list only "*?*" values to be moved to InvalidTags.
The UpdateRecord will look like:
The value for the dynamic property /tags[*] which has the path to the tag array values:
${field.value:replaceAll('\W+','?')}
Note: Based on your input make sure the JsonRecordSetWrite OutputGrouping property is set to "One Line Per Object"
The Jolt Spec in this case will be as follows:
[
{
"operation": "shift",
"spec": {
"tags": {
"*": {
"*?*": {
"$": "InvalidTags[]"
},
"*": {
"$": "tags[]"
}
}
}
}
},
{
"operation": "remove",
"spec": {
"InvalidTags": ""
}
}
]
This way you dont have to worry about what special characters you might end up with.
If you find this helpful please accept solution.
Thanks
Created on 11-19-2023 08:14 AM - edited 11-19-2023 08:16 AM
Hi @simonsig ,
There is no straight forward generic way to do this using just jolt only. What you are looking for involves some regex manipulation that I dont think Jolt spec support. Maybe at some point it will be supported through the "modify-overwrite-beta" spec by adding regexReplace function to the string functions.
Jolt however can support simple pattern matching. For example, if you use RHS "*=*" when traversing the tags array value in the spec , it will give you values that contain "=" character. You can use this to accommodate for all possible special characters then direct values to InvalidTag object , whatever is left "*" can be directed to valid tags array. Then you can use another spec to remove the InvalidTags. The drawback of this is that you have to know all possible special characters and the list them as in the following spec:
[
{
"operation": "shift",
"spec": {
"tags": {
"*": {
// find all values with special character listed
// below and move to InvalidTags
"*\\\"*": {
"$": "InvalidTags[]"
},
"*-*": {
"$": "InvalidTags[]"
},
"*=*": {
"$": "InvalidTags[]"
},
"*:*": {
"$": "InvalidTags[]"
},
//The values that wont have any of the special
//characters above will be moved to tags
"*": {
"$": "tags[]"
}
}
}
}
},
{
"operation": "remove",
"spec": {
//Remove InvalidTags
"InvalidTags": ""
}
}
]
If you cant account for all possible special characters , then you cant just rely on Jolt . The simplest way I can think of is to use an UpdateRecord processor before Jolt where you can use Expression Language that support regex replace functions to replace all special characters using the regex pattern "\W+" with a common character like "?" then you can use the Jolt spec above but list only "*?*" values to be moved to InvalidTags.
The UpdateRecord will look like:
The value for the dynamic property /tags[*] which has the path to the tag array values:
${field.value:replaceAll('\W+','?')}
Note: Based on your input make sure the JsonRecordSetWrite OutputGrouping property is set to "One Line Per Object"
The Jolt Spec in this case will be as follows:
[
{
"operation": "shift",
"spec": {
"tags": {
"*": {
"*?*": {
"$": "InvalidTags[]"
},
"*": {
"$": "tags[]"
}
}
}
}
},
{
"operation": "remove",
"spec": {
"InvalidTags": ""
}
}
]
This way you dont have to worry about what special characters you might end up with.
If you find this helpful please accept solution.
Thanks
Created 11-20-2023 12:52 AM
Thank you @SAMSAL superstar!