<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question NiFi - how to remove efficiently a line from a big flow file? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/NiFi-how-to-remove-efficiently-a-line-from-a-big-flow-file/m-p/124719#M87463</link>
    <description>&lt;P&gt;Task may seem to be easy, but in fact it isn't...&lt;/P&gt;&lt;P&gt;I have a big flow file (&amp;gt;1GB), from which I need to remove, let's say, first line (header) before further processing.&lt;/P&gt;&lt;P&gt;So far I had 3 attempt, but none of them works as expected:&lt;/P&gt;&lt;P&gt;1) ReplaceText&lt;/P&gt;&lt;P&gt;Works for small files, but the problematic file is too big to load it into memory (I get memory out of bounds exception).&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12969-capture.jpg" style="width: 790px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22711i44706C87498915FE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12969-capture.jpg" alt="12969-capture.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;2) SplitText&lt;/P&gt;&lt;P&gt;I was trying to use SplitText, but due to &lt;A href="https://issues.apache.org/jira/browse/NIFI-3255" rel="nofollow noopener noreferrer" target="_blank"&gt;this issue&lt;/A&gt; I cannot skip the header line in this processor at the moment.&lt;/P&gt;&lt;P&gt;In other words - this processor fails whenever Header Line Count &amp;gt; 0.&lt;/P&gt;&lt;P&gt;3) ExecuteProcess&lt;/P&gt;&lt;P&gt;I can imagine running a linux command (e.g. tail or sed) to do this job, but it requires saving the flow file to the disk, which might be also costly.&lt;/P&gt;&lt;P&gt;Do you have any ideas if this can be done more efficiently?&lt;/P&gt;&lt;P&gt;Thanks, Michal&lt;/P&gt;</description>
    <pubDate>Mon, 19 Aug 2019 10:13:20 GMT</pubDate>
    <dc:creator>michal_rudko</dc:creator>
    <dc:date>2019-08-19T10:13:20Z</dc:date>
  </channel>
</rss>

