<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to schedule process to fetch only new files from a directory in apache nifi? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326767#M229928</link>
    <description>&lt;P&gt;Hi samsal,&lt;/P&gt;&lt;P&gt;Thanks for the reply can you please share the screen shots i'm bit confused related to which properties to use in Listfile and fetchfile.&lt;/P&gt;</description>
    <pubDate>Thu, 07 Oct 2021 04:39:23 GMT</pubDate>
    <dc:creator>CodeLa</dc:creator>
    <dc:date>2021-10-07T04:39:23Z</dc:date>
    <item>
      <title>How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326729#M229922</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am looking to fetch only new files added in the directory exactly one time and once file is picked it should not be picked again in apache nifi. I want to schedule this process to to every 3 hours. Please provide solution with screenshot the properties you used to do this process or which processors you are using. I am bit confused between listfile getfile and fetchfile and which properties to used.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help in this issue will be greatly appreciated.&lt;/P&gt;&lt;P&gt;Thank You!&lt;/P&gt;</description>
      <pubDate>Wed, 06 Oct 2021 19:50:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326729#M229922</guid>
      <dc:creator>CodeLa</dc:creator>
      <dc:date>2021-10-06T19:50:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326735#M229924</link>
      <description>&lt;P&gt;Take&amp;nbsp; a look at the Nifi ListFile &amp;amp; Fetch File processors. They both work together. The ListFile will read files metadata based on the last read file modified date and will keep state of that so that only newly added files will be read. The fetch file will take the filename parameter from the ListFile processor and fetch the contents.&lt;/P&gt;&lt;P&gt;Hope that helps&lt;/P&gt;</description>
      <pubDate>Wed, 06 Oct 2021 20:28:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326735#M229924</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2021-10-06T20:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326767#M229928</link>
      <description>&lt;P&gt;Hi samsal,&lt;/P&gt;&lt;P&gt;Thanks for the reply can you please share the screen shots i'm bit confused related to which properties to use in Listfile and fetchfile.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 04:39:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326767#M229928</guid>
      <dc:creator>CodeLa</dc:creator>
      <dc:date>2021-10-07T04:39:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326875#M229954</link>
      <description>&lt;P&gt;You really dont need a screenshot because you are not changing much properties:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1-&amp;nbsp; Create ListFile Processor &amp;amp; set the "Input Directory" to whatever directory you want to track.&lt;/P&gt;&lt;P&gt;2- Create a FetchFile Processor and connect the ListFile to it via the "success" relationship. under the processor properties keep the "File to Fetch" property set to "${absolute.path}/${filename}" since the path and the file name will be set in those attributes using the ListFile and that is it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After that the content of the file will be passed via the success relation and you can do whatever you want with it just as if you are using GetFile except the ListFile will keep state of the latest file timestamp it grabbed and basically use that to grab any new files added to the folder and update the state to new timestamp and so.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 16:47:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326875#M229954</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2021-10-07T16:47:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326879#M229957</link>
      <description>&lt;P&gt;Hi samsal,&lt;/P&gt;&lt;P&gt;Thanks for your help. I have used list file and then fetch file and their is one only file in my directory and I've set Listing strategy in listfile to 'Tracking Timestamps' and when I executed the job it brings the file once only. I am confused will it bring same file only once or whenever I execute the job?&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 17:30:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326879#M229957</guid>
      <dc:creator>CodeLa</dc:creator>
      <dc:date>2021-10-07T17:30:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326880#M229958</link>
      <description>&lt;P&gt;Once it brings it it wont bring again because it will save its timestamp and then use that to get newer files added and so on.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 17:34:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326880#M229958</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2021-10-07T17:34:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326882#M229960</link>
      <description>&lt;P&gt;Got it. Thank you&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 18:06:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326882#M229960</guid>
      <dc:creator>CodeLa</dc:creator>
      <dc:date>2021-10-07T18:06:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326895#M229966</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/92397"&gt;@CodeLa&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I want to point out that tracking timestamps will not always guarantee NiFi will consume all files from the input file directory depending on how they are being placed in that directory.&lt;BR /&gt;&lt;BR /&gt;The ListFile processor looks at the last modified timestamp on the file.&amp;nbsp; It then lists all files since the last recorded timestamp stored in NiFi state manager from the previous processor execution.&amp;nbsp; On first run their will be no state and this everything currently is listed.&lt;BR /&gt;&lt;BR /&gt;Now consider the scenarios below which can affect above from listing all files:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The mechanism that is writing the files to that inout directory is not updating the last modified timestamp on the file once it is done writing to it.&amp;nbsp; Let say we have file 1 that starts being written to as 12:00:01.000 and file 2 that starts being written as 12:00:01.300. File 2 completes first and is consumed by listFile and stored state is updated to reflect 12:00:01.300.&amp;nbsp; Now File 1 completes, but is never consumed by ListFile since its last modified timestamp is older than file 2.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you are in such a scenario, the ListFile offers a different "Listing Strategy" called "Tracking Entities" which tracks filenames as well in a cache service which allows it to still list files that may have an older timestamp.&lt;BR /&gt;&lt;BR /&gt;Another thing to consider is listFile may list the same file more than once. Consider this scenario:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;You tell NiFi ListFile to list files from directory /nifi/myfiles/.&amp;nbsp; The mechanism writing these files to the target directory does update the last modified timestamp as file is being written, but does not use a ".&amp;lt;filename&amp;gt;" (dot rename) approach to writing these files (means file is initially a hidden file until file write completes and then is renamed and made unhidden. Default listFile config ignores hidden files).&amp;nbsp; So when ListFile runs, it sees that file with newer last modified timestamp and lists it.&amp;nbsp; Then on next execution it sees same file again because its last modified timestamp is updated as file is still being written to.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;If you are in such a scenario, you would want to make use of the "Minimum File Age" property.&amp;nbsp; This property tells the listFile to ignore any files were the last modified time stamp when compared to current time is not at least that configured amount of time old (that means last modified timestamp has not changed for configured amount of time).&amp;nbsp; That configured time is arbitrary and what ever length is needed for you to be confident file write was complete.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Something else you need to consider depends on if both the following are true:&lt;/P&gt;&lt;P&gt;1. You are using a multi node NiFi cluster&lt;BR /&gt;2. The configured directory you are listing from is mounted to every node.&lt;BR /&gt;&lt;BR /&gt;Since every node in a NiFi cluster is executing the same dataflow, you want to avoid every node from listing the same files. IN this scenario you would change the "Execution" configuration from "All nodes" to "Primary" on the ListFile and change "input Directory location" from "local" to "remote".&amp;nbsp; Then you will want to set "load balance Strategy" to "Round Robin" on the connection between ListFile and FetchFile.&lt;BR /&gt;&lt;BR /&gt;NOTE: Never set the Execution on any processor that has an inbound connection to "Primary node".&amp;nbsp; ONLY processor with not inbound connection should be considered for this execution configuration.&lt;BR /&gt;&lt;BR /&gt;I know this is a lot to digest, but very important to be aware of to ensure success.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If you found this response assisted with your query, please take a moment to login and click on "&lt;STRONG&gt;Accept as Solution&lt;/STRONG&gt;" below this post.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;Matt&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 21:40:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/326895#M229966</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2021-10-07T21:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/327001#M229974</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Matt thanks for the explanation&lt;/P&gt;</description>
      <pubDate>Fri, 08 Oct 2021 10:51:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/327001#M229974</guid>
      <dc:creator>CodeLa</dc:creator>
      <dc:date>2021-10-08T10:51:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391073#M247459</link>
      <description>&lt;P&gt;Hi&lt;BR /&gt;I am facing a issue here. If i add multipile file with same timestamp list file is taking only 1 or 2&amp;nbsp; out 20 file. Is nifi listfile tracking entities will resolve proiblem&lt;/P&gt;</description>
      <pubDate>Sun, 28 Jul 2024 08:32:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391073#M247459</guid>
      <dc:creator>varungupta</dc:creator>
      <dc:date>2024-07-28T08:32:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391122#M247482</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/112330"&gt;@varungupta&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;This is a ~3 year old post with an already accepted answer.&amp;nbsp; You are likely to get more responsive answers if you were to start a new thread.&amp;nbsp; NiFi would have also evolved considerable over the past 3 years.&lt;BR /&gt;&lt;BR /&gt;Yes, tracking entities does not rely on timestamps to ensure listing of new FlowFiles and will help you here.&amp;nbsp; NiFi grabbing 1 -2&amp;nbsp; of 20 is more then just timestamps, I suspect that how the files are being moved into the consumption directory is also impacting you.&lt;BR /&gt;&lt;BR /&gt;Tracking Timestamps is easiest and least resource consumption default setup, but does not work for all use cases.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;Timestamp is based on the last modified timestamp.&amp;nbsp; When listing is performed it lists all Files with last processor state stored timestamp up to most recent file's last modified timestamp.&amp;nbsp; Problem can happen if last modified timestamp is not updated.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;For example some system writes to directory A on your local machine and after write completes, it moves file to Directory B.&amp;nbsp; With that atomic move the file timestamp is not updated.&amp;nbsp; If the move does not happen fast enough it may get missed in the current listing. it is also possible that a moved file has an older last modified timestamp that another smeller files moved quicker to dir B.&amp;nbsp; Thus resulting a timestamp stored in state that would be newer and thus resulting in that other file being ignored.&lt;BR /&gt;&lt;BR /&gt;Tracking entities was added to&amp;nbsp; solution to these types of problems.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Please help our community grow. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jul 2024 16:13:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391122#M247482</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2024-07-29T16:13:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to schedule process to fetch only new files from a directory in apache nifi?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391212#M247520</link>
      <description>&lt;P&gt;Thanks a lot Matt for the answer.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jul 2024 14:41:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/How-to-schedule-process-to-fetch-only-new-files-from-a/m-p/391212#M247520</guid>
      <dc:creator>varungupta</dc:creator>
      <dc:date>2024-07-31T14:41:53Z</dc:date>
    </item>
  </channel>
</rss>

