<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: NiFi JVM settings for large files. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/NiFi-JVM-settings-for-large-files/m-p/232748#M194583</link>
    <description>&lt;P&gt;You don't necessarily need a heap larger than the file unless you are using a processor that reads the entire file into memory, which generally most processors should not do unless absolutely necessary, and if they do then they should document it.&lt;/P&gt;&lt;P&gt;In your approach of  "list--&amp;gt;fetch--&amp;gt;splittext--&amp;gt;replacetext--&amp;gt;mergecontent" the issue is that you are splitting a single flow file into millions of flow files, and even though the content of all these flow files won't be in memory, its still millions of Java objects on the heap.&lt;/P&gt;&lt;P&gt;Whenever possible you should avoid this splitting approach. You should be using the "record" processors to manipulate the data in place and keep your 22GB as a single flow file. I don't know what you actually need to do to each record so I can't say exactly, but most likely after your fetch processor you just need an UpdateRecord processor that would stream 1 record in, update a field, and stream the record out, so it would never load the entire content into memory, and would never create millions of flow files.&lt;/P&gt;</description>
    <pubDate>Thu, 01 Mar 2018 22:53:32 GMT</pubDate>
    <dc:creator>bbende</dc:creator>
    <dc:date>2018-03-01T22:53:32Z</dc:date>
  </channel>
</rss>

