Support Questions

Find answers, ask questions, and share your expertise

NiFi List File Processor - Interrupt run schedule and restart the run schedule when new incoming file

avatar
Contributor

Scenario:

  1. I have a List File Processor looking out for incoming file ("file 1"). I scheduled it to start picking up the "file 1" 20s after the "file 1" is downloaded.
  2. Let assume that the List File Processor noticed the incoming "file 1" and started the delay 20 sec before picking the "file 1".
  3. In the middle of 20 sec delay, there is an new incoming file ("file 2") noticed by the List File Processor.
  4. List File Processor will be interrupt and reset the 20 sec schedule time when ("file 2") appear within the previous 20 sec delay

Could the List File Processor in the middle of 20 sec delay be interrupt and restart the initial delay? Appreciate help with explanation and example.

Thanks.

1 REPLY 1

avatar
Master Mentor

@techNerd 

I think your scenario may need a bit more detail to understand what you are doing and what it is doing versus what you want the flow to do.

 

The ListFile only listed information about file(s) found in the target directory. It then generates a one of more FlowFiles from the listing that was performed.  A corresponding FetchFile processor would actually retrieve the content for each of the listed files.

From the sounds of your scenario, you have instituted a 20 sec delay somehow between that ListFile and FetchFile processor?

Or you have configured the run schedule on the ListFile processor to "20 secs"?

Setting the run schedule only tells the processor how often it should request a thread from the NiFi controller that can be used to execute the processor code.  Once the processor gets its thread, it will execute.  The ListFile processor will list all files present in the target source directory based on the configured file and path filters.  For each File listed it will produce a FlowFile.  Run schedule does not mean it executes for a full 20 seconds continuously checking the input directory to see if new files arrive.  The run schedule also not impacted by how long it takes a listing to complete.  It will request a thread every 20 seconds (00:00:20, 00:00:40, 00:01:00, etc...).  The configured "concurrent tasks" controls whether the processor can execute multiple listing in parallel.  Let say the thread that was executed at 00:01:00 was still executing 20 seconds later. Since that thread is still using the default 1 concurrent task, the listFile would not be allowed to request another thread from the controller at that time.

Since the run schedule is independent of the thread execution duration, there is no way to dynamically alter the schedule. There is also no way for a new file to get listed at same time as a previous file (unless both were already present at time of listing) within the same thread execution.  The listFile use the configured "Listing Strategy" to control how it handles listing of files.  A "tracking" strategy is used to prevent the ListFile processor from listing the same file twice by recording some information in a state provider or a cache.  If "No Tracking" is configured, the listFile will list all found files every time it executes.  ListFile does not remove the source file from the directory.  Removal of the source file is a function optionally handled by the corresponding FetchFile processor.

If this is not clear, share more details around your use case and flow design specific so I can provide more direct feedback.

Here is the documentation around processor scheduling (works the same no matter which processor is being used):
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt