<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Help to match and remove value from array with JOLT in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379211#M243808</link>
    <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;superstar!&lt;/P&gt;</description>
    <pubDate>Mon, 20 Nov 2023 08:52:55 GMT</pubDate>
    <dc:creator>simonsig</dc:creator>
    <dc:date>2023-11-20T08:52:55Z</dc:date>
    <item>
      <title>Help to match and remove value from array with JOLT</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379182#M243798</link>
      <description>&lt;P&gt;After using JOLT for many years now I still find myself fumbling my way into solutions, I am however stuck on the problem of how to selectively remove values from a JSON array.&lt;BR /&gt;&lt;BR /&gt;The scenario is that&amp;nbsp; I am trying to solve for is that I want to wholly remove any values in the array whereby there is an equals "=" or really any "special" character.&lt;/P&gt;&lt;P&gt;Here is the example input:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;{
  "tags": [
    "misp-pattern=\\\"Phishing - T1566\\\"",
    "circl:incident-classification=\\\"phishing\\\"",
    "IDS"
  ]
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and the output should be;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;{
  "tags": [
    "IDS"
  ]
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For those interested I am trying to transpose tags from MISP (Threat Intel platform) and under certain conditions it inserts some horrible entries I want to wholly remove them.&lt;BR /&gt;&lt;BR /&gt;I understand I can do this in a few steps by extracting and "recasting" this back into JSON however I need to do it via JOLT&lt;/P&gt;</description>
      <pubDate>Sun, 19 Nov 2023 06:59:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379182#M243798</guid>
      <dc:creator>simonsig</dc:creator>
      <dc:date>2023-11-19T06:59:07Z</dc:date>
    </item>
    <item>
      <title>Re: Help to match and remove value from array with JOLT</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379185#M243799</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/107967"&gt;@simonsig&lt;/a&gt; ,&lt;/P&gt;&lt;P&gt;There is no straight forward generic way to do this using just jolt only. What you are looking for involves some regex manipulation that I dont think Jolt spec support. Maybe at some point it will be supported through the "&lt;STRONG&gt;modify-overwrite-beta&lt;/STRONG&gt;"&amp;nbsp; spec by adding regexReplace function to the string functions.&lt;/P&gt;&lt;P&gt;Jolt however can support simple pattern matching. For example, if you use RHS "*=*" when traversing the tags array value in the&amp;nbsp; spec , it will give you values that contain "=" character. You can use this to accommodate for all possible special characters then direct values to InvalidTag object , whatever is left "*" can be directed to valid tags array. Then you can use another spec to remove the InvalidTags. The drawback of this is that you have to know all possible special characters and the list them as in the following spec:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          // find all values with special character listed
          // below and move to InvalidTags
          "*\\\"*": {
            "$": "InvalidTags[]"
          },
          "*-*": {
            "$": "InvalidTags[]"
          },
          "*=*": {
            "$": "InvalidTags[]"
          },
          "*:*": {
            "$": "InvalidTags[]"
          },
          //The values that wont have any of the special
          //characters above will be moved to tags
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      //Remove InvalidTags 
      "InvalidTags": ""
    }
  }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you cant account for all possible special characters , then you cant just rely on Jolt . The simplest way I can think of is to use an UpdateRecord processor before Jolt&amp;nbsp; where you can use Expression Language that support regex replace functions to replace all special characters using the regex pattern "\W+"&amp;nbsp; with a common character like "?" then you can use the Jolt spec above but list only "*?*" values&amp;nbsp; to be moved to InvalidTags.&lt;/P&gt;&lt;P&gt;The UpdateRecord will look like:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="SAMSAL_0-1700410078886.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/38958i1EFB00AB40EFB1C7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="SAMSAL_0-1700410078886.png" alt="SAMSAL_0-1700410078886.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The value for the dynamic property /tags[*] which has the path to the tag array values:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;${field.value:replaceAll('\W+','?')}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note: Based on your input make sure the JsonRecordSetWrite OutputGrouping property is set to "One Line Per Object"&lt;/P&gt;&lt;P&gt;The Jolt Spec in this case will be as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[
  {
    "operation": "shift",
    "spec": {
      "tags": {
        "*": {
          "*?*": {
            "$": "InvalidTags[]"
          },
          "*": {
            "$": "tags[]"
          }
        }
      }
    }
   },
  {
    "operation": "remove",
    "spec": {
      "InvalidTags": ""
    }
  }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This way you dont have to worry about what special characters you might end up with.&lt;/P&gt;&lt;P&gt;If you find this helpful please &lt;STRONG&gt;accept&lt;/STRONG&gt; solution.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 19 Nov 2023 16:16:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379185#M243799</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2023-11-19T16:16:37Z</dc:date>
    </item>
    <item>
      <title>Re: Help to match and remove value from array with JOLT</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379211#M243808</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;superstar!&lt;/P&gt;</description>
      <pubDate>Mon, 20 Nov 2023 08:52:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Help-to-match-and-remove-value-from-array-with-JOLT/m-p/379211#M243808</guid>
      <dc:creator>simonsig</dc:creator>
      <dc:date>2023-11-20T08:52:55Z</dc:date>
    </item>
  </channel>
</rss>

