<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Error with nifi accumulation in one file in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Error-with-nifi-accumulation-in-one-file/m-p/371787#M241087</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104679"&gt;@VLban&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;MergeContent and MergeRecords handling merging of FlowFiles's content differently.&amp;nbsp; Since your FlowFiles already contain Json formatted record(s), using MergeContent is not going to be the correct processor to use.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;MergeContent does not care about the data/content format (except for Avro) of the inbound FlowFiles.&amp;nbsp; With Binary Concatenation, On flowFile's content bytes are simply write starting at the end of the last FlowFile's content. So in the case of JSON, the resulting merged FlowFile's content is not going to be valid json anymore.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Both processors will bin FlowFiles each time the Processor executes based on its run schedule.&amp;nbsp; At the end of each bin cycle the bins are evaluated to see if both configured mins are satisfied.&amp;nbsp; If so, the bin will be merged.&amp;nbsp; Setting a max does not mean that the bin will wait to get merged until the max has been met.&amp;nbsp; So you would be better to set your min to 500 MB if you always want files of at least 500 MB and set you max to a value a bit larger then that.&amp;nbsp; Doing so may result in bins that say have 480 MB binned and next FlowFile can't be added because it would then exceed configured max (FlowFile placed in new bin).&amp;nbsp; So the Max Bin Age property when set will force a bin to merge once the bin has existed for the configured max bin age (this avoid FlowFile getting stuck in these merge based processors).&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic"&gt;If you found that the provided solution(s) assisted you with your query, please take a moment to login and click&lt;/FONT&gt;&amp;nbsp;&lt;FONT face="arial black,avant garde" color="#FF0000"&gt;Accept as Solution&amp;nbsp;&lt;/FONT&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;below each response that helped.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;Matt&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 30 May 2023 21:22:47 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2023-05-30T21:22:47Z</dc:date>
    <item>
      <title>Error with nifi accumulation in one file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-with-nifi-accumulation-in-one-file/m-p/370858#M240835</link>
      <description>&lt;P&gt;I have two processes&lt;BR /&gt;1. consumerkafkarecord --mergerecord &amp;nbsp;--puthdfs&lt;BR /&gt;2. consumerkafkarecord --mergecontent--puthdfs&lt;/P&gt;&lt;P&gt;when I use process 1, I have files in ndfs on the output readable spark, database, python libraries without problems, but the files are not larger than 200mb, all different sizes, although 500mb is set, but it is not filled&lt;/P&gt;&lt;P&gt;when I use process 2 with the same parameters in mb and the number of lines, the files I get are exactly 500mb, but these files do not open, not by spark, not by any database, not by python libraries&lt;/P&gt;&lt;P&gt;question why?&lt;BR /&gt;I also want a large file always 500mb and so that it can be read without problems as in process 1&lt;/P&gt;&lt;P&gt;mergecontent settings&lt;/P&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0 selected"&gt;&lt;SPAN class="table-cell required"&gt;Merge Strategy&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Bin-Packing Algorithm&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Merge Format&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Binary Concatenation&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Attribute Strategy&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Keep Only Common Attributes&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Correlation Attribute Name&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Minimum Number of Entries&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;10000&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Maximum Number of Entries&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;1000000&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Minimum Group Size&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;100 MB&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Maximum Group Size&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;500 MB&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Max Bin Age&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Maximum number of Bins&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;10&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Delimiter Strategy&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Text&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Header&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Footer&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Demarcator&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;\n&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;mergerecord&amp;nbsp;settings&lt;/P&gt;&lt;DIV&gt;&lt;DIV class="configuration-tab processor-configuration-tabs-content"&gt;&lt;DIV class="property-container"&gt;&lt;DIV class="property-table slickgrid_760332 ui-widget"&gt;&lt;DIV class="slick-pane slick-pane-top slick-pane-left"&gt;&lt;DIV class="slick-viewport slick-viewport-top slick-viewport-left"&gt;&lt;DIV class="grid-canvas grid-canvas-top grid-canvas-left"&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0 selected"&gt;&lt;SPAN class="table-cell required"&gt;Record Reader&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;JsonTreeReader&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Record Writer&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;ParquetRecordSetWriter&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Merge Strategy&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Bin-Packing Algorithm&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Correlation Attribute Name&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Attribute Strategy&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Keep Only Common Attributes&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Minimum Number of Records&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;10000&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Maximum Number of Records &lt;SPAN&gt;1000000&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Minimum Bin Size&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;100 MB&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Maximum Bin Size&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;500 MB&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row odd"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell"&gt;Max Bin Age&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;No value set&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="ui-widget-content slick-row even"&gt;&lt;DIV class="slick-cell l0 r0"&gt;&lt;SPAN class="table-cell required"&gt;Maximum Number of Bins&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;10&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 16 May 2023 10:17:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-with-nifi-accumulation-in-one-file/m-p/370858#M240835</guid>
      <dc:creator>VLban</dc:creator>
      <dc:date>2023-05-16T10:17:40Z</dc:date>
    </item>
    <item>
      <title>Re: Error with nifi accumulation in one file</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Error-with-nifi-accumulation-in-one-file/m-p/371787#M241087</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/104679"&gt;@VLban&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;MergeContent and MergeRecords handling merging of FlowFiles's content differently.&amp;nbsp; Since your FlowFiles already contain Json formatted record(s), using MergeContent is not going to be the correct processor to use.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;MergeContent does not care about the data/content format (except for Avro) of the inbound FlowFiles.&amp;nbsp; With Binary Concatenation, On flowFile's content bytes are simply write starting at the end of the last FlowFile's content. So in the case of JSON, the resulting merged FlowFile's content is not going to be valid json anymore.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Both processors will bin FlowFiles each time the Processor executes based on its run schedule.&amp;nbsp; At the end of each bin cycle the bins are evaluated to see if both configured mins are satisfied.&amp;nbsp; If so, the bin will be merged.&amp;nbsp; Setting a max does not mean that the bin will wait to get merged until the max has been met.&amp;nbsp; So you would be better to set your min to 500 MB if you always want files of at least 500 MB and set you max to a value a bit larger then that.&amp;nbsp; Doing so may result in bins that say have 480 MB binned and next FlowFile can't be added because it would then exceed configured max (FlowFile placed in new bin).&amp;nbsp; So the Max Bin Age property when set will force a bin to merge once the bin has existed for the configured max bin age (this avoid FlowFile getting stuck in these merge based processors).&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic"&gt;If you found that the provided solution(s) assisted you with your query, please take a moment to login and click&lt;/FONT&gt;&amp;nbsp;&lt;FONT face="arial black,avant garde" color="#FF0000"&gt;Accept as Solution&amp;nbsp;&lt;/FONT&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;below each response that helped.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="batang,apple gothic" color="#000000"&gt;Matt&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 May 2023 21:22:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Error-with-nifi-accumulation-in-one-file/m-p/371787#M241087</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2023-05-30T21:22:47Z</dc:date>
    </item>
  </channel>
</rss>

