Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi: Cron Schedule not working as expected

avatar
Contributor

Problem: I have set up a listSFTP processor to run between 2am to 2pm everyday.  But the files are not being picked up by the processor. 

 

Existing setup and configuration: The files are generated at the SFTP server at 12 am to 1am every day. 

ListSFTP Configuration

Schedule:

Jagapriyan_0-1666942312909.png

 

listing strategy: tracking timestamps

other tracking configuration

 

Jagapriyan_1-1666942595360.png

 

When I start this processor, it runs as expected for 1 day or 2 and after that, the files are not picked.  Is it the tracking time window of 3 hours that is affecting the file being listed?

 

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Jagapriyan I suspect issue with last modified timestamps since missed files have older last modified timestamp than what was already consumed from the target directory that is compounded by the sub-directory structure.  My recommendation is switch to using the listing strategy "Tracking Entities" instead. 
Tracking Entities will keep track of filenames and timestamps so even an older timestamped file will get consumed if its filename is not in the tracked entities list stored in the distributed cache.

Let me know is making this change resolves yoru issue.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt



View solution in original post

4 REPLIES 4

avatar
Super Mentor

@Jagapriyan 
Since you are using the Listing Strategy "Tracking Timestamps", the configuration property "Entity Tracking Time Window" is not used.  The "Tracking Timestamps" strategy is very dependent on timestamps of the target files. Typically when files are not being picked up it is because the timestamps on those files are equal to or less than the last recorded timestamp in the ListSFTP processors state.  This can happen when files in the SFTP server target folders do not have their last modified timestamp updated (for example moving a file from another directory into a SFTP server directory. A copy would update the timestamp since the file is being written again).

- Does your target SFTP path have multiple sub-directories which are being searched?  Is Search Recursively set to "true"? 
- Are there symlink directories in use?
- Have you looked the the state recorded timestamp for your SFTP server directories?  Are your missed files having older timestamps?
- How many files average are being written to the target SFTP between 12am and 1am each day?

I also see you have min file age of 5 minutes. This means the last Modified timestamp must be 5 minutes older than the execution time of your processor for the file to be eligible for consumption.   I see you stated your files are placed in the SFTP server between 12am - 1am each day and you scheduled your ListSFTP processor using a cron schedule at 10 minutes and 1 second every hour between 2am and 2pm.   Why not just have your listSFTP processor run all the time?  Is this because timestamps are not being updated consistently?

If you switch to using the listing strategy "Tracking Entities" instead, do you still see the issue? Tracking entities works when there is issues with timestamps and was developed for that reason.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

 

avatar
Contributor

Hi @MattWho 

Please find my responses

- Does your target SFTP path have multiple sub-directories which are being searched?  Is Search Recursively set to "true"?   --> Search recursively is set to true
- Are there symlink directories in use? - No
- Have you looked the state recorded timestamp for your SFTP server directories?  Are your missed files having older timestamps? -- Missing files have older time stamps. 
- How many files average are being written to the target SFTP between 12am and 1am each day? file count ranges from 10 - 100 and all these files are not being picked. 

 

Why not just have your listSFTP processor run all the time?  Is this because timestamps are not being updated consistently? Even running the processor all the time with cron schedule is not picking the files. 

avatar
Super Mentor

@Jagapriyan I suspect issue with last modified timestamps since missed files have older last modified timestamp than what was already consumed from the target directory that is compounded by the sub-directory structure.  My recommendation is switch to using the listing strategy "Tracking Entities" instead. 
Tracking Entities will keep track of filenames and timestamps so even an older timestamped file will get consumed if its filename is not in the tracked entities list stored in the distributed cache.

Let me know is making this change resolves yoru issue.

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt



avatar
Contributor

Thanks @MattWho 

This has solved the problem.