Support Questions
Find answers, ask questions, and share your expertise

ListFile to list all the files sorted by datetime

New Contributor

How do I use ListFile to process the files in datetime order , I need to process the files in chronological order, oldest file first ?

1 ACCEPTED SOLUTION

Master Guru

@Shu @Raja M

I just want to correct one thing. There is no default prioritizer when none are selected. The "OldestFlowFileFirstPrioritizer" while it may appear in many cases as the default behavior you see it is purely coincidental. By default the order in which Flowfiles are processed from a queue is performance based. This means Flowfiles are processed in a order that best makes use of disk performance to minimize disk seeks. (So think of this as processed in order of written to disk.) In many cases this acts like oldestFlowFileFirst, but that can change if FlowFiles in a connection come from multiple sources flows.

-

Enforcing the order of FlowFile processing in NiFi can be challenging. Some processors work on batches of Files while other works on one FlowFile a ta time. FlowFiles routed down different paths are processed with no consideration of FlowFiles processed down a different path. Concurrent tasks on processors allow for concurrent execution of a processor (each task works on its own FlowFile.) with some FlowFile being processed faster then others making them complete out of order. Some processor may fail to complete a task for one reason or another in normal operations (FetchFile retrieving content and network issue causes connection to drop. FlowFile is penalized and routed to "failure" relationship. FetchFile moves on to next FlowFile and retries the failed FlowFile if Failure is looped back and once penalty expires. Now these Flowfiles are out of order).

-

NiFi was designed for speed at its core with the intent of each processor to work on FlowFIles it recieved with out needing tio care about other FlowFiles in any other queues.

-

There are a few processors introduced that may be used to help in your dataflow design to achieve this goal. Keep in mind that any enforcement of order is going to affect throughput of your NiFi because of the overhead introduced in doing so.

You will want to take a look at the following processors:

1 EnforceOrder <-- This processor works well fo numerically order Flowfiles which timestamps are not going to provide.

-

2 Wait and notify. <-- This allows you to enforce the processing of one FlowFile at a time in order.

----- Upon listing your Flowfiles, you would feed a wait processor. This processor could release one FlowFile in o the rest of your dataflow (FetchFile...etc...) and finally the notify processor, once processing of the FlowFile was successful. The notify would then trigger the Wait processor to release next FlowFile. (Set OldestFlowFileFirst prioritizer on connection between ListFile and Wait processors)

-

Thank you,

Matt

View solution in original post

3 REPLIES 3

Super Guru

@Raja M

List file processor with default configurations already does what you are expecting(lists/sends oldest file first) unless if you configured sucess queue with some kind of prioritizers.

ListFile is a stateful processor and creates a flowfile/s that represents the file/s based on the last modified time and default queue prioritizer is OldestFlowfileFirst prioritizer.

if you are using default queue configs then the processor will send first the oldest flowfile in the data flow will be processed first.

Available NiFi queue prioritizers are as follows:

  • FirstInFirstOutPrioritizer: Given two FlowFiles, the one that reached the connection first will be processed first.
  • NewestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is newest in the dataflow will be processed first.
  • OldestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is oldest in the dataflow will be processed first. 'This is the default scheme that is used if no prioritizers are selected.'
  • PriorityAttributePrioritizer: Given two FlowFiles that both have a "priority" attribute, the one that has the highest priority value will be processed first. Note that an UpdateAttribute processor should be used to add the "priority" attribute to the FlowFiles before they reach a connection that has this prioritizer set. Values for the "priority" attribute may be alphanumeric, where "a" is a higher priority than "z", and "1" is a higher priority than "9", for example.

To get to know about which files are transferred from the list file processor use Control Rate processor with

Rate Control Criteria

flowfile count

Maximum Rate

1

Now you can view in the success queue of Control Rate processor which flowfiles are released from the list file processor.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Master Guru

@Shu @Raja M

I just want to correct one thing. There is no default prioritizer when none are selected. The "OldestFlowFileFirstPrioritizer" while it may appear in many cases as the default behavior you see it is purely coincidental. By default the order in which Flowfiles are processed from a queue is performance based. This means Flowfiles are processed in a order that best makes use of disk performance to minimize disk seeks. (So think of this as processed in order of written to disk.) In many cases this acts like oldestFlowFileFirst, but that can change if FlowFiles in a connection come from multiple sources flows.

-

Enforcing the order of FlowFile processing in NiFi can be challenging. Some processors work on batches of Files while other works on one FlowFile a ta time. FlowFiles routed down different paths are processed with no consideration of FlowFiles processed down a different path. Concurrent tasks on processors allow for concurrent execution of a processor (each task works on its own FlowFile.) with some FlowFile being processed faster then others making them complete out of order. Some processor may fail to complete a task for one reason or another in normal operations (FetchFile retrieving content and network issue causes connection to drop. FlowFile is penalized and routed to "failure" relationship. FetchFile moves on to next FlowFile and retries the failed FlowFile if Failure is looped back and once penalty expires. Now these Flowfiles are out of order).

-

NiFi was designed for speed at its core with the intent of each processor to work on FlowFIles it recieved with out needing tio care about other FlowFiles in any other queues.

-

There are a few processors introduced that may be used to help in your dataflow design to achieve this goal. Keep in mind that any enforcement of order is going to affect throughput of your NiFi because of the overhead introduced in doing so.

You will want to take a look at the following processors:

1 EnforceOrder <-- This processor works well fo numerically order Flowfiles which timestamps are not going to provide.

-

2 Wait and notify. <-- This allows you to enforce the processing of one FlowFile at a time in order.

----- Upon listing your Flowfiles, you would feed a wait processor. This processor could release one FlowFile in o the rest of your dataflow (FetchFile...etc...) and finally the notify processor, once processing of the FlowFile was successful. The notify would then trigger the Wait processor to release next FlowFile. (Set OldestFlowFileFirst prioritizer on connection between ListFile and Wait processors)

-

Thank you,

Matt

New Contributor

Yes.. I have to select "OldestFlowFileFirstPrioritizer" to make it work. Thank you @Matt Clarke and @Shu

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.