<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question NiFi's GetHDFS processor with Cron schedule not reading all files in the directory in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215793#M63627</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I've NiFi (standalone instance 1.0.1) GetHDFS with this cron schedule - 0  30  0  *  *  ?&lt;/P&gt;&lt;P&gt;I want the processor to start at 12:30 AM daily; so, with the above schedule, the processor started at the expected time this morning and some files have been read, but it hasn't finished reading all the files; I had quite a few files to read in the directory yesterday and right now it still has 1200+ files left in the directory; I have the "Keep Source File" set to false, so it would/should delete the files as it reads; that shows the files left in the directory haven't been read by the processor; &lt;/P&gt;&lt;P&gt;My understanding is, with the above schedule, once GetHDFS starts, it should keep reading until all the files in the directory are exhausted; but I'm not understanding why some files are still left.&lt;/P&gt;&lt;P&gt;Please help, thank you.&lt;/P&gt;</description>
    <pubDate>Sun, 25 Jun 2017 22:39:25 GMT</pubDate>
    <dc:creator>Raj_B</dc:creator>
    <dc:date>2017-06-25T22:39:25Z</dc:date>
    <item>
      <title>NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215793#M63627</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I've NiFi (standalone instance 1.0.1) GetHDFS with this cron schedule - 0  30  0  *  *  ?&lt;/P&gt;&lt;P&gt;I want the processor to start at 12:30 AM daily; so, with the above schedule, the processor started at the expected time this morning and some files have been read, but it hasn't finished reading all the files; I had quite a few files to read in the directory yesterday and right now it still has 1200+ files left in the directory; I have the "Keep Source File" set to false, so it would/should delete the files as it reads; that shows the files left in the directory haven't been read by the processor; &lt;/P&gt;&lt;P&gt;My understanding is, with the above schedule, once GetHDFS starts, it should keep reading until all the files in the directory are exhausted; but I'm not understanding why some files are still left.&lt;/P&gt;&lt;P&gt;Please help, thank you.&lt;/P&gt;</description>
      <pubDate>Sun, 25 Jun 2017 22:39:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215793#M63627</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-06-25T22:39:25Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215794#M63628</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/10100/rbolla.html"&gt;Raj B&lt;/A&gt; This looks similar to &lt;A href="https://issues.apache.org/jira/browse/NIFI-4069"&gt;NIFI-4069&lt;/A&gt;&lt;/P&gt;&lt;P&gt;As a workaround, please try and change the cron schedule to 0,30 30 0 * *. so that it runs twice in the same minute. &lt;/P&gt;&lt;P&gt;Let us know if that helps. &lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 17:08:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215794#M63628</guid>
      <dc:creator>Schandhok</dc:creator>
      <dc:date>2017-06-26T17:08:13Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215795#M63629</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13875/schandhok.html" nodeid="13875"&gt;@Shashank Chandhok&lt;/A&gt; the schedule change to "0,30 30 0 * * ?" helped to read few additional files, but many files still remain in the directory&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 20:41:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215795#M63629</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-06-26T20:41:23Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215796#M63630</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/10100/rbolla.html"&gt;Raj B&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please check the timestamps of the files remaining in the directory. If they are being added during the process run time. Or if the timestamp is older than the CRON runtime of the processor. &lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 21:36:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215796#M63630</guid>
      <dc:creator>Schandhok</dc:creator>
      <dc:date>2017-06-26T21:36:09Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215797#M63631</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/13875/schandhok.html" nodeid="13875"&gt;@Shashank Chandhok&lt;/A&gt; actually, the files I'm trying to process are from the day before; in my directory path in GetHDFS processor, I'm using expression language to point to the directory that was created yesterday and the files in that directory are from yesterday. So when the CRON scheduler starts at 12:30 am, all files that would need to be processed should all be there already in that directory.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 22:06:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215797#M63631</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-06-26T22:06:26Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215798#M63632</link>
      <description>&lt;P&gt;Not sure why I need to schedule the GetHDFS processor to run continuously (I set to run every 15 seconds), but this schedule exhausts all files from the directory -  0/15 * * * * ?&lt;/P&gt;&lt;P&gt;In my case since I'm loading files the next day (GetHDFS directory path points to previous day's directory), this resolves the issue I was facing. &lt;/P&gt;</description>
      <pubDate>Mon, 26 Jun 2017 22:09:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215798#M63632</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-06-26T22:09:27Z</dc:date>
    </item>
    <item>
      <title>Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215799#M63633</link>
      <description>&lt;P&gt;Thanks to &lt;A rel="user" href="https://community.cloudera.com/users/363/bbende.html" nodeid="363"&gt;@Bryan Bende&lt;/A&gt;, I needed to change the batch size property in GetHDFS, to read all files in the directory.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/108547/need-clarification-on-how-nifi-processors-run-with.html#answer-109798" target="_blank"&gt;https://community.hortonworks.com/questions/108547/need-clarification-on-how-nifi-processors-run-with.html#answer-109798&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jun 2017 00:20:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/NiFi-s-GetHDFS-processor-with-Cron-schedule-not-reading-all/m-p/215799#M63633</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-06-29T00:20:05Z</dc:date>
    </item>
  </channel>
</rss>

