Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Help to match and remove value from array with JOLT

avatar
New Contributor

After using JOLT for many years now I still find myself fumbling my way into solutions, I am however stuck on the problem of how to selectively remove values from a JSON array.

The scenario is that  I am trying to solve for is that I want to wholly remove any values in the array whereby there is an equals "=" or really any "special" character.

Here is the example input:

 

{
  "tags": [
    "misp-pattern=\\\"Phishing - T1566\\\"",
    "circl:incident-classification=\\\"phishing\\\"",
    "IDS"
  ]
}

 

and the output should be;

 

{
  "tags": [
    "IDS"
  ]
}

 

For those interested I am trying to transpose tags from MISP (Threat Intel platform) and under certain conditions it inserts some horrible entries I want to wholly remove them.

I understand I can do this in a few steps by extracting and "recasting" this back into JSON however I need to do it via JOLT

1 ACCEPTED SOLUTION

avatar

Hi @simonsig ,

There is no straight forward generic way to do this using just jolt only. What you are looking for involves some regex manipulation that I dont think Jolt spec support. Maybe at some point it will be supported through the "modify-overwrite-beta"  spec by adding regexReplace function to the string functions.

Jolt however can support simple pattern matching. For example, if you use RHS "*=*" when traversing the tags array value in the  spec , it will give you values that contain "=" character. You can use this to accommodate for all possible special characters then direct values to InvalidTag object , whatever is left "*" can be directed to valid tags array. Then you can use another spec to remove the InvalidTags. The drawback of this is that you have to know all possible special characters and the list them as in the following spec:

 

[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          // find all values with special character listed
          // below and move to InvalidTags
          "*\\\"*": {
            "$": "InvalidTags[]"
          },
          "*-*": {
            "$": "InvalidTags[]"
          },
          "*=*": {
            "$": "InvalidTags[]"
          },
          "*:*": {
            "$": "InvalidTags[]"
          },
          //The values that wont have any of the special
          //characters above will be moved to tags
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      //Remove InvalidTags 
      "InvalidTags": ""
    }
  }
]

 

If you cant account for all possible special characters , then you cant just rely on Jolt . The simplest way I can think of is to use an UpdateRecord processor before Jolt  where you can use Expression Language that support regex replace functions to replace all special characters using the regex pattern "\W+"  with a common character like "?" then you can use the Jolt spec above but list only "*?*" values  to be moved to InvalidTags.

The UpdateRecord will look like:

SAMSAL_0-1700410078886.png

The value for the dynamic property /tags[*] which has the path to the tag array values:

 

${field.value:replaceAll('\W+','?')}

 

Note: Based on your input make sure the JsonRecordSetWrite OutputGrouping property is set to "One Line Per Object"

The Jolt Spec in this case will be as follows:

 

[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          "*?*": {
            "$": "InvalidTags[]"
          },
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      "InvalidTags": ""
    }
  }
]

 

This way you dont have to worry about what special characters you might end up with.

If you find this helpful please accept solution.

Thanks

 

View solution in original post

2 REPLIES 2

avatar

Hi @simonsig ,

There is no straight forward generic way to do this using just jolt only. What you are looking for involves some regex manipulation that I dont think Jolt spec support. Maybe at some point it will be supported through the "modify-overwrite-beta"  spec by adding regexReplace function to the string functions.

Jolt however can support simple pattern matching. For example, if you use RHS "*=*" when traversing the tags array value in the  spec , it will give you values that contain "=" character. You can use this to accommodate for all possible special characters then direct values to InvalidTag object , whatever is left "*" can be directed to valid tags array. Then you can use another spec to remove the InvalidTags. The drawback of this is that you have to know all possible special characters and the list them as in the following spec:

 

[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          // find all values with special character listed
          // below and move to InvalidTags
          "*\\\"*": {
            "$": "InvalidTags[]"
          },
          "*-*": {
            "$": "InvalidTags[]"
          },
          "*=*": {
            "$": "InvalidTags[]"
          },
          "*:*": {
            "$": "InvalidTags[]"
          },
          //The values that wont have any of the special
          //characters above will be moved to tags
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      //Remove InvalidTags 
      "InvalidTags": ""
    }
  }
]

 

If you cant account for all possible special characters , then you cant just rely on Jolt . The simplest way I can think of is to use an UpdateRecord processor before Jolt  where you can use Expression Language that support regex replace functions to replace all special characters using the regex pattern "\W+"  with a common character like "?" then you can use the Jolt spec above but list only "*?*" values  to be moved to InvalidTags.

The UpdateRecord will look like:

SAMSAL_0-1700410078886.png

The value for the dynamic property /tags[*] which has the path to the tag array values:

 

${field.value:replaceAll('\W+','?')}

 

Note: Based on your input make sure the JsonRecordSetWrite OutputGrouping property is set to "One Line Per Object"

The Jolt Spec in this case will be as follows:

 

[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          "*?*": {
            "$": "InvalidTags[]"
          },
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      "InvalidTags": ""
    }
  }
]

 

This way you dont have to worry about what special characters you might end up with.

If you find this helpful please accept solution.

Thanks

 

avatar
New Contributor

Thank you @SAMSAL superstar!