<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nifi attribute containing large text value in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190517#M80551</link>
    <description>&lt;P&gt;For that case, if you just need to replace the ESC with | then use ReplaceText with Line-by-Line strategy (with either Regex Replace or Literal Replace, one or the either or both should work) to replace \x1b with | &lt;/P&gt;</description>
    <pubDate>Fri, 13 Jul 2018 06:23:10 GMT</pubDate>
    <dc:creator>mburgess</dc:creator>
    <dc:date>2018-07-13T06:23:10Z</dc:date>
    <item>
      <title>Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190513#M80547</link>
      <description>&lt;P&gt;Hi, dear Experts!&lt;/P&gt;&lt;P&gt;Could you please help with following issue:&lt;/P&gt;&lt;P&gt;I have a processor ExtractText that processes JSON flow file and &lt;STRONG&gt;creates two attributes with large JSON text&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;mainJSON = ^.*(?=\x1b)|^((?!\x1b).)*$)&lt;/P&gt;&lt;P&gt;enrichmentJSON = ((?&amp;lt;=\x1b).*)&lt;/P&gt;&lt;P&gt;As an output there were created 3 attributes for mainJSON and enrichmentJSON&lt;/P&gt;&lt;P&gt;mainJSON: mainJSON, mainJSON.0, mainJSON.1&lt;/P&gt;&lt;P&gt;--- each containing the same portion of expected result.&lt;/P&gt;&lt;P&gt;Isn't it possible to store large text value in an attribute? Is there another way to store and pass large text value as an attribute?&lt;/P&gt;&lt;P&gt;As a next step I wanted to combine these two attributes with other attributes in ReplaceText processor and put them into hive table as separate columns of one row.&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://community.cloudera.com/t5/image/serverpage/image-id/6883iAE66B30862CEF80B/image-size/large?v=1.0&amp;amp;px=999" border="0" alt="attrib.jpg" title="attrib.jpg" /&gt;</description>
      <pubDate>Wed, 11 Jul 2018 19:12:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190513#M80547</guid>
      <dc:creator>a_gulshani</dc:creator>
      <dc:date>2018-07-11T19:12:19Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190514#M80548</link>
      <description>&lt;P&gt;It is usually not recommended to store large values in attributes as they are kept in memory which can cause issues for the entire flow. Can you share an example JSON and what you're trying to get as a result? You might be able to use UpdateRecord to create the new fields in-place (i.e. in the flow file contents) rather than having to extract fields into attributes.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 20:21:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190514#M80548</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2018-07-11T20:21:09Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190515#M80549</link>
      <description>&lt;P&gt;Hi, &lt;A rel="user" href="https://community.cloudera.com/users/641/mburgess.html" nodeid="641"&gt;@Matt Burgess&lt;/A&gt;!&lt;/P&gt;&lt;P&gt;My sample formatted JSON file content  (in original flow file JSON objects are separated by new line)&lt;/P&gt;&lt;PRE&gt;{
  "schemaNameSpace": "CPMCDM.com.bis.bss.cpm.event.schema",
  "schemaName": "CpmCustomerChangeEvent",
  "schemaVersion": "5.1.1",
  "eventHeader": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.EventHeader": {
      "eventCreationTime": {
        "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.Time": {
          "timestamp": {
            "string": "2018-04-03T23:08:38.652+03:00"
          },
          "timeZoneType": {
            "string": "SYSTEM_TIME_ZONE"
          },
          "zoneName": {
            "string": "Europe/Kiev"
          }
        }
      },
      "cpmInstanceHost": {
        "string": "env6-cpm1.dbss.bis.ua"
      },
      "recordUniqueId": {
        "string": "77ECA557BFA74C98B2792222C9C72CED"
      }
    }
  },
  "customerInformation": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.CustomerInformation": {
      "customerId": {
        "string": "5A834B1C27DE4FAF9F038B370AA3DDA4"
      },
      "partyId": null
    }
  },
  "genericInterfaceParameters": null,
  "requestInfo": null,
  "partyChangeResult": null
}&amp;#27;{
  "schemaNameSpace": "com.bis.bss.edm.eventDataEnrichment.schema",
  "schemaName": "EventDataEnrichment",
  "schemaVersion": "1.0.0",
  "enrichedData": [
    
  ]
}
{
  "schemaNameSpace": "CPMCDM.com.bis.bss.cpm.event.schema",
  "schemaName": "CpmCustomerChangeEvent",
  "schemaVersion": "5.1.1",
  "eventHeader": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.EventHeader": {
      "eventCreationTime": {
        "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.Time": {
          "timestamp": {
            "string": "2018-04-03T23:08:39.652+03:10"
          },
          "timeZoneType": {
            "string": "SYSTEM_TIME_ZONE"
          },
          "zoneName": {
            "string": "Europe/Kiev"
          }
        }
      },
      "cpmInstanceHost": {
        "string": "env6-cpm1.dbss.bis.ua"
      },
      "recordUniqueId": {
        "string": "72DEA157BFA74C98B2792222C0C11CBE"
      }
    }
  },
  "customerInformation": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.CustomerInformation": {
      "customerId": {
        "string": "1E2234B1C27DE4FAF9F038B370AA3DBE4"
      },
      "partyId": null
    }
  },
  "genericInterfaceParameters": null,
  "requestInfo": null,
  "partyChangeResult": null
}
{
  "schemaNameSpace": "CPMCDM.com.bis.bss.cpm.event.schema",
  "schemaName": "CpmCustomerChangeEvent",
  "schemaVersion": "5.1.1",
  "eventHeader": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.EventHeader": {
      "eventCreationTime": {
        "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.Time": {
          "timestamp": {
            "string": "2018-04-03T23:08:40.652+02:20"
          },
          "timeZoneType": {
            "string": "SYSTEM_TIME_ZONE"
          },
          "zoneName": {
            "string": "Europe/Kiev"
          }
        }
      },
      "cpmInstanceHost": {
        "string": "env6-cpm1.dbss.bis.ua"
      },
      "recordUniqueId": {
        "string": "55CBA557BFA74C98B2792222C9A11CDE"
      }
    }
  },
  "customerInformation": {
    "CPMCDM.com.bis.bss.cpm.event.schema.cpmCustomerChangeEvent.CustomerInformation": {
      "customerId": {
        "string": "1E244B1C27DE4FAF9F038B370AA3DDD5"
      },
      "partyId": null
    }
  },
  "genericInterfaceParameters": null,
  "requestInfo": null,
  "partyChangeResult": null
}&amp;#27;{
  "schemaNameSpace": "com.bis.bss.edm.eventDataEnrichment.schema",
  "schemaName": "EventDataEnrichment",
  "schemaVersion": "1.0.0",
  "enrichedData": [
    
  ]
}
&amp;lt;br&amp;gt;&lt;/PRE&gt;&lt;P&gt;Important thing to notice here is that some of the JSON objects in the flow file contain &lt;STRONG&gt;extention seperated with ESC (\x1b).&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;&amp;#27;{
  "schemaNameSpace": "com.bis.bss.edm.eventDataEnrichment.schema",
  "schemaName": "EventDataEnrichment",
  "schemaVersion": "1.0.0",
  "enrichedData": [
    
  ]
}&amp;lt;br&amp;gt;&lt;/PRE&gt;&lt;P&gt;Schema looks like below:&lt;/P&gt;&lt;PRE&gt;{mainJSON}&amp;#27;{extentionJSON}&lt;BR /&gt;{mainJSON}&lt;BR /&gt;{mainJSON}&amp;#27;{extentionJSON}
.....&lt;/PRE&gt;&lt;P&gt;In the output I would like to have the following format:&lt;/P&gt;&lt;P&gt; mainJSON | extentionJSON&lt;BR /&gt;mainJSON |  &lt;/P&gt;&lt;P&gt;mainJSON | extentionJSON&lt;/P&gt;&lt;P&gt;etc...&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 11:25:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190515#M80549</guid>
      <dc:creator>a_gulshani</dc:creator>
      <dc:date>2018-07-12T11:25:12Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190516#M80550</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/641/mburgess.html"&gt;@Matt Burgess&lt;/A&gt;&lt;/P&gt;&lt;P&gt;So it is needed to replace &lt;STRONG&gt;ESC &lt;/STRONG&gt;(&lt;STRONG&gt;\x1b&lt;/STRONG&gt;) with delimiter: '|' using UpdateRecord processor. &lt;/P&gt;&lt;P&gt;Could you please help to configure this processor to impement this replacement in flow file records!&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 12:49:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190516#M80550</guid>
      <dc:creator>a_gulshani</dc:creator>
      <dc:date>2018-07-12T12:49:04Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190517#M80551</link>
      <description>&lt;P&gt;For that case, if you just need to replace the ESC with | then use ReplaceText with Line-by-Line strategy (with either Regex Replace or Literal Replace, one or the either or both should work) to replace \x1b with | &lt;/P&gt;</description>
      <pubDate>Fri, 13 Jul 2018 06:23:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190517#M80551</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2018-07-13T06:23:10Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190518#M80552</link>
      <description>&lt;P&gt;Note that since your format is not JSON nor JSON-per-line, you will have to do further processing before using any processors (record-based or not) that handle JSON. As of NiFi 1.7.0 (via &lt;A href="https://issues.apache.org/jira/browse/NIFI-4456" target="_blank"&gt;NIFI-4456&lt;/A&gt;) the JsonTreeReader (and writer) allow for JSON-per-line, but your format is not exactly that either. If the existing processors or controller services (i.e. readers/writers) don't work, you might have to resort to a ScriptedRecordReader/Writer or a scripting processor to do custom handling.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jul 2018 06:25:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190518#M80552</guid>
      <dc:creator>mburgess</dc:creator>
      <dc:date>2018-07-13T06:25:50Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi attribute containing large text value</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190519#M80553</link>
      <description>&lt;P&gt;If you use large attributes, you will have serious issue with the "snapshot" file in the flow content repository. I've just killed my PROD this way last week : the snapshot was too big too fit in memory at startup : my data was lost.&lt;/P&gt;</description>
      <pubDate>Sun, 15 Jul 2018 23:38:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-attribute-containing-large-text-value/m-p/190519#M80553</guid>
      <dc:creator>mermillod</dc:creator>
      <dc:date>2018-07-15T23:38:00Z</dc:date>
    </item>
  </channel>
</rss>

