<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Merge Fileflow files based on time rather than size or number of entries in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140961#M56313</link>
    <description>&lt;P&gt;Also, after viewing your answer, Im wondering if processing 24 hours worth of data and having it stored on the JVM heap memory would be too much.  Probably.  This is unfortunate.  When we were using Flume, it would create a .tmp file that would be constantly gathering the data into it rather than storing it in memory so you could make them for as large as your wanted.  This is not an appealling part of Nifi.&lt;/P&gt;</description>
    <pubDate>Tue, 07 Mar 2017 02:28:19 GMT</pubDate>
    <dc:creator>elloyd</dc:creator>
    <dc:date>2017-03-07T02:28:19Z</dc:date>
    <item>
      <title>Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140957#M56309</link>
      <description>&lt;P&gt;Hello, I am trying to use the MergeContent processor in Nifi to deliver a single hdfs file for a specified length of time (no maximum number of entries) rather than the options presented in the MergeContent processor: Number of Entries, Group Size.&lt;/P&gt;&lt;P&gt;I see there is something called Max Bin Age, but I am under the impression that Bins are different than Bundles and it doesn't work as I'd hoped when I use Max Bin Size to do what I am trying to do.&lt;/P&gt;&lt;P&gt;To be clear, I am trying to deliver data into separate directories that are divided according to /year/month/day with one file in each day directory.  I still have yet to attempt to create the directory structure.&lt;/P&gt;&lt;P&gt;Thanks for any help.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 02:17:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140957#M56309</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-03-07T02:17:13Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140958#M56310</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/15261/elloyd.html" nodeid="15261"&gt;@Eric Lloyd&lt;/A&gt;&lt;P&gt;The MergeContent processor adds FlowFiles from the incoming queue to virtual bins.  Once the configured criteria on a bin is met all the FlowFile in that Bin are merged.  &lt;/P&gt;&lt;P&gt;So if you want to continue to merge incoming FlowFiles until X amount of time has passed then setting the "Max bin age" property is what you want.&lt;/P&gt;&lt;P&gt;Note:  Be careful how many FlowFiles you merge.  The FlowFile attributes for all incoming FlowFiles being merged in a single bin live in the NiFi JVM heap memory.  Merging to many FlowFiles at once can result in OutOfMemory (OOM) errors. There is no formula for the exact number you can merge per bundle/bin.  It depends on how many attributes exist on a FlowFile and how large the values are associated to those attributes.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 02:23:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140958#M56310</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-03-07T02:23:37Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140959#M56311</link>
      <description>&lt;P&gt;So is it correct to say that a bin and a bundle are the same thing?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 02:26:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140959#M56311</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-03-07T02:26:07Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140960#M56312</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15261/elloyd.html" nodeid="15261"&gt;@Eric Lloyd&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you set an attribute on all your FlowFiles with the a value of "&amp;lt;year/month/day&amp;gt;" for the FlowFile, you can use that attribute as your "Correlation Attribute Name" in the mergeContent processor to make sure that only FlowFile from the same day are added to a bin.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 02:26:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140960#M56312</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-03-07T02:26:43Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140961#M56313</link>
      <description>&lt;P&gt;Also, after viewing your answer, Im wondering if processing 24 hours worth of data and having it stored on the JVM heap memory would be too much.  Probably.  This is unfortunate.  When we were using Flume, it would create a .tmp file that would be constantly gathering the data into it rather than storing it in memory so you could make them for as large as your wanted.  This is not an appealling part of Nifi.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2017 02:28:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140961#M56313</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-03-07T02:28:19Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140962#M56314</link>
      <description>&lt;P&gt;Its interesting because I am trying your methods of having the bin only complete according to a period of time and neither are working.  I have added an attribute called hour which retrieves the yyyy-MM-dd-HH and saves it.  Then I tell the MergeProcessor Correlation Attribute Name property to group according to "hour".  I can see the actual Attribute when I view the files in the queue and the hour attribute looks correct ... it almost seems like the value in Minimum Group Size is overriding the Correlation Attribute Name.  Is there a way to tell the MergeProcessor to ONLY use the Correlation Attribute Name to judge bin size and ignore the number of entries and Group Size?&lt;/P&gt;&lt;P&gt;Attached is a screenshot of my MergeProcessor config values and a screenshot of the value of my "hour" attribute.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="13355-screen-shot-2017-03-07-at-23853-pm.png" style="width: 763px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21738iDE25BEE1817D2972/image-size/medium?v=v2&amp;amp;px=400" role="button" title="13355-screen-shot-2017-03-07-at-23853-pm.png" alt="13355-screen-shot-2017-03-07-at-23853-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="13356-screen-shot-2017-03-07-at-23938-pm.png" style="width: 472px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21739i002636C6080A027A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="13356-screen-shot-2017-03-07-at-23938-pm.png" alt="13356-screen-shot-2017-03-07-at-23938-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 08:16:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140962#M56314</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2019-08-19T08:16:48Z</dc:date>
    </item>
    <item>
      <title>Re: Merge Fileflow files based on time rather than size or number of entries</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140963#M56315</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/15261/elloyd.html" nodeid="15261"&gt;@Eric Lloyd&lt;/A&gt;&lt;P&gt;With the above configuration, it would only take 1 FlowFile to be assigned to a bin before that bin was marked eligible for merging.  There is nothing there that force the processor to wait for other FlowFiles to be allocated to a bin before merge, Both minimums are set to 1 FlowFile and 0 Bytes.   In order to actually get 100,000 Flowfiles (this is high and may trigger OOM), there would need to be 100,000 Flowfiles all with the same correlation attribute value in the incoming connection queue at the time the processor runs.  This is almost certainly not going to be the case.&lt;/P&gt;&lt;P&gt;The Max bin age simply sets an exist strategy here.  It will merge a bin regardless if minimums have been met if the bin age has reached this value.&lt;BR /&gt;&lt;BR /&gt;You may want to set more reasonable values for your mins and also consider using multiple mergeContent processors in series to step up to the final merged number you are looking for.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jan 2018 01:07:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-Fileflow-files-based-on-time-rather-than-size-or/m-p/140963#M56315</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2018-01-17T01:07:21Z</dc:date>
    </item>
  </channel>
</rss>

