<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unusual data placement on file rollover in Nifi - HDFS in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219644#M60418</link>
    <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15065-screen-shot-2017-05-04-at-24133-pm.png" style="width: 892px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15963iA8CD1E8A60A84291/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15065-screen-shot-2017-05-04-at-24133-pm.png" alt="15065-screen-shot-2017-05-04-at-24133-pm.png" /&gt;&lt;/span&gt;I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot)&lt;/P&gt;&lt;P&gt;This is why:&lt;/P&gt;&lt;P&gt;at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s.  (also we changed this from the lowest value because it was causing intermittent data loss)&lt;/P&gt;&lt;P&gt;at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them.&lt;/P&gt;&lt;P&gt;Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.&lt;/P&gt;</description>
    <pubDate>Sun, 18 Aug 2019 02:37:08 GMT</pubDate>
    <dc:creator>elloyd</dc:creator>
    <dc:date>2019-08-18T02:37:08Z</dc:date>
    <item>
      <title>Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219639#M60413</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15016-screen-shot-2017-05-03-at-125256-pm.png" style="width: 771px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15965i6DC834BD4FD03BC5/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15016-screen-shot-2017-05-03-at-125256-pm.png" alt="15016-screen-shot-2017-05-03-at-125256-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15017-screen-shot-2017-05-03-at-125316-pm.png" style="width: 1229px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15966i07642209D68DC4F3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15017-screen-shot-2017-05-03-at-125316-pm.png" alt="15017-screen-shot-2017-05-03-at-125316-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15018-screen-shot-2017-05-03-at-125324-pm.png" style="width: 956px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15967i3B296C3D1A14A490/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15018-screen-shot-2017-05-03-at-125324-pm.png" alt="15018-screen-shot-2017-05-03-at-125324-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15019-screen-shot-2017-05-03-at-125331-pm.png" style="width: 467px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15968i27FA4BD7A142B822/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15019-screen-shot-2017-05-03-at-125331-pm.png" alt="15019-screen-shot-2017-05-03-at-125331-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15020-screen-shot-2017-05-03-at-125337-pm.png" style="width: 1165px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15969i32CFCB404ECE26A3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15020-screen-shot-2017-05-03-at-125337-pm.png" alt="15020-screen-shot-2017-05-03-at-125337-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;So here is our setup.&lt;/P&gt;&lt;P&gt;Server 1: TailFile -&amp;gt; PublishKafka&lt;/P&gt;&lt;P&gt;Server 2: ConsumeKafka -&amp;gt; ExtractText -&amp;gt; Update Attribute -&amp;gt; MergeContent -&amp;gt; UpdateAttribute (create filename) -&amp;gt; PutHDFS&lt;/P&gt;&lt;P&gt;We currently have it set up to parse out the timestamp from the files and save them as variable using the ExtractText command so we can create our filename and HDFS directories with the variables in this format:&lt;/P&gt;&lt;P&gt;Examples:  May_03_16_39    (May 3rd at 16:39 pm)&lt;/P&gt;&lt;P&gt;May_03_16_40  (May 3rd at 16:40 pm)&lt;/P&gt;&lt;P&gt;May_03_16_41  (May 3rd at 16:41 pm)&lt;/P&gt;&lt;P&gt;Our directory structure goes down to the minute: 2017/May/03/16/39&lt;/P&gt;&lt;P&gt;What we see is that during the file rollover, it puts a few seconds of data from the end of one file and a few seconds from the beginning of the next file into a file called:   May_03_16_&lt;/P&gt;&lt;P&gt;Please see screenshots of file structure output, PutHDFS config, UpdateAttribute (create filename) config and if you could use anything else that would help let me know.&lt;/P&gt;&lt;P&gt;We are using the append function with PutHDFS to put all files of the same minute into a specific file.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:37:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219639#M60413</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2019-08-18T02:37:45Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219640#M60414</link>
      <description>&lt;P&gt;Looks like you are using $now vs. the syslog datetime for the rollover. Is there a reason for that? I expect to use the source (syslog) timestamp for the rollover, so you will have only matching timestamps in the hdfs file.&lt;/P&gt;</description>
      <pubDate>Thu, 04 May 2017 00:07:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219640#M60414</guid>
      <dc:creator>wbekker</dc:creator>
      <dc:date>2017-05-04T00:07:20Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219641#M60415</link>
      <description>&lt;P&gt;Thanks for the idea.  I corrected that but am still seeing the same behavior.&lt;/P&gt;</description>
      <pubDate>Thu, 04 May 2017 02:42:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219641#M60415</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-05-04T02:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219642#M60416</link>
      <description>&lt;P&gt;Can you post the full date format used? &lt;/P&gt;</description>
      <pubDate>Thu, 04 May 2017 02:47:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219642#M60416</guid>
      <dc:creator>wbekker</dc:creator>
      <dc:date>2017-05-04T02:47:57Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219643#M60417</link>
      <description>&lt;P&gt;Are you referring to the string used to separate the date in PutHDFS?&lt;/P&gt;&lt;P&gt;/topics/minifitest/${allAttributes("syslog_year", "syslog_month", "syslog_day", "syslog_hour"):join("/")}&lt;/P&gt;&lt;P&gt;Here is our date format example:  2017-05-04 17:15:14,655&lt;/P&gt;&lt;P&gt;We split up 2017 into syslog_year, 05 into syslog_month, 04 into syslog_day, 17 into syslog_hour, 15 in syslog_minute  ... etc etc&lt;/P&gt;&lt;P&gt;Ultimately we use this string to generate the filename:&lt;/P&gt;&lt;P&gt;${allAttributes("syslog_year", "syslog_month", "syslog_day", "syslog_hour", "syslog_minute"):join("_")}&lt;/P&gt;&lt;P&gt;It all parses into directories correctly but then our files over three minutes end up in three correctly named folders (as screenshot) with missing chunks in the filename missing the minute...&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15064-screen-shot-2017-05-04-at-22335-pm.png" style="width: 463px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15964i8C6E8D890729053C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15064-screen-shot-2017-05-04-at-22335-pm.png" alt="15064-screen-shot-2017-05-04-at-22335-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:37:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219643#M60417</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2019-08-18T02:37:16Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219644#M60418</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="15065-screen-shot-2017-05-04-at-24133-pm.png" style="width: 892px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/15963iA8CD1E8A60A84291/image-size/medium?v=v2&amp;amp;px=400" role="button" title="15065-screen-shot-2017-05-04-at-24133-pm.png" alt="15065-screen-shot-2017-05-04-at-24133-pm.png" /&gt;&lt;/span&gt;I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot)&lt;/P&gt;&lt;P&gt;This is why:&lt;/P&gt;&lt;P&gt;at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s.  (also we changed this from the lowest value because it was causing intermittent data loss)&lt;/P&gt;&lt;P&gt;at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them.&lt;/P&gt;&lt;P&gt;Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 02:37:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219644#M60418</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2019-08-18T02:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219645#M60419</link>
      <description>&lt;P&gt;I suspect that there is a connection between the number of messages being sent and Run Duration in our ExtractText processor (see screenshot)&lt;/P&gt;&lt;P&gt;This is why:&lt;/P&gt;&lt;P&gt;at 10,000 messages being sent to the Kafka topic / second for total of 1,000,000 we always see the odd displaced data in the filename without the minute on it no matter if the Run Duration is 500 ms, 1 s, or 2 s.  (also we changed this from the lowest value because it was causing intermittent data loss)&lt;/P&gt;&lt;P&gt;at 1,000 message / second for total of 100,000 if we set the Run Duration to 1 s, the files are perfect, the way we want them.&lt;/P&gt;&lt;P&gt;Our ultimate use case is to send messages more than 10,000 / second (considerably) so maybe this will help shed some light.&lt;/P&gt;</description>
      <pubDate>Fri, 05 May 2017 01:46:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219645#M60419</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-05-05T01:46:32Z</dc:date>
    </item>
    <item>
      <title>Re: Unusual data placement on file rollover in Nifi - HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219646#M60420</link>
      <description>&lt;P&gt;Changing the Concurrent Tasks in ExtractText to 3 and reducing the Run Duration to 500ms fixed the problem.&lt;/P&gt;</description>
      <pubDate>Fri, 05 May 2017 02:07:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unusual-data-placement-on-file-rollover-in-Nifi-HDFS/m-p/219646#M60420</guid>
      <dc:creator>elloyd</dc:creator>
      <dc:date>2017-05-05T02:07:21Z</dc:date>
    </item>
  </channel>
</rss>

