Created 10-19-2018 01:06 PM
Hi,
I have tried to get files from a several folders at remote server simultaneously using ListSFTP-> FetchSFTP. Results, some files are missing from download without any error message (comm failed, file not found or permission denied). When I have checked the nifi-app-logs, I have got some message as below. What does this message mean? Is there something to do with "Data timeout" or/and "Connexion timeout" parameter in FetchSFTP ou ListSFTP processors?
Thank you for your suggestion.
Emma
2018-10-18 09:54:30,163 INFO [Provenance Maintenance Thread-2]o.a.n.p.expiration.FileRemovalAction Removed expired ProvenanceTable-of-Contents file /nifi-repository-provenance/toc/64059747.toc
Created 10-19-2018 06:05 PM
-
The list based processors rely on File timestamps to determine if a file should be listed or not. This means that the list based processors may not list files in the target location if:
-
1. New files added to source location do not have their timestamp updated. (Thus last recorded timestamp in NiFi from previous listing is newer the age of file that was added)
2. Multiple files are being written to source location at same time and the list based processor did not list all of them in my execution. Second execution would miss other files because of recorded timestamp from first list execution.
-
Not really sure what NiFi version you are running, but here are a few Jiras aimed at making list based processors work much better:
1. https://jira.apache.org/jira/browse/NIFI-3332 <-- (Addressed as of Apache NiFi 1.4.0)
2. https://jira.apache.org/jira/browse/NIFI-4069 <-- (Addressed as of Apache NiFi 1.4.0)
3. https://jira.apache.org/jira/browse/NIFI-5157 <-- (Addressed as of Apache NiFi 1.8.0 being released soon)
-
Thank you,
Matt
-
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
Created 11-07-2018 05:27 PM
-
*** Community Forum Tip: Try to avoid starting a new answer in response to an existing answer. Instead use comments to respond to existing answers. There is no guaranteed order to different answer which can make it hard following a discussion.
-
For performance reason the method used to determine what source files are listed is very simplistic in nature and is based off the accuracy of the timestamps on the source files.
-
Lets assume within a source directory you have a 1 kb file written with a timstamp accurate down to seconds (2018-11-06 16:49:22) which NiFi lists. While that listing is occurring another file is being written to same directory but has not completed being written and is still a "." (hidden) file which list processor ignores by default. The listSFTP processor records a timestamp of the newest listed source file in state. On next execution of ListSFTP, only files with a timestamp newer then "2018-11-06 16:49:22" would be listed. So possibly that other file that was till being written was completed and renamed (to remove ".") within same second of last listing. NiFi would then exclude it in next listing. Another possibility is the system writing the files to the source SFTP server directories is not updating the LastModified timestamps on the "new" files. This resulting in some "new" source files with older timestamps.
-
If any of this is the case, perhaps https://jira.apache.org/jira/browse/NIFI-5406 that is included in Apache NIFi 1.8.x will help.
-
Thank you,
Matt
Created 11-07-2018 12:43 PM
Thank @Matt Clarke for your answer. Actually, my problem should match the 2nd explanation. Multiple files are being written to the remote server at same time. But I do not understand why listSFTP processor cannot list all of them. Is there something to do with value of parameter "Data timeout" or "Connextion timeout" in this processor?
My nifi version is 1.5.0.
Thank you in advance for your help.
Emma