Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

ListSFTP and FetchFTP duplicate files generated

avatar

Hi,

We have a process that generates text files in a ftp server. The process generates the files by writing multiple passes and could take from say more than a minute to a few minutes to generate the files.

 

The nifi ListSFTP in this case generates multiple files as we are using tracking timestamps. The file's timestamp is getting updated continuously during the generation process.

 

We dont have control over the existing process that generates the files and has no notification mechanism in place that can tell us when the file generation is complete.

 

 

What would be the best practice to handle this scenario in a List+FetchSFTP flow to avoid duplicate files generated in the target system. 

 

Please advice.

 

Thank you

2 ACCEPTED SOLUTIONS

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thank you @MattWho , in case we are not able to upgrade to 1.10 or backport it to the current version. Would Wait processor be an acceptable direction to pursue?

avatar
Super Mentor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Hi @MattWho 

We are pursuing cloudera to get a backport of this feature in the current release, as we are unable to upgrade hdf to a version having this feature.

 

Meanwhile I am trying to implement the same using the ExecuteScript(penalize flow file), and eliminate duplicates.

 

When trying to eliminate the duplicates, the detectduplicate does not work as expected when 2 different file names are introduced in the flowfile. Is this expected behaviour? How do we handle this scenario to have unique file list of multiple files over a period of time.

Capture.PNG

 

The first generateflowfile sets filename as file1.txt and second generateflowfile sets filename as file2.txt.

 

When i disable one of the generateflowfile, it is filtering the duplicates as expected. But when both the generateflowfile are running, we are getting duplicates.

 

Capture2.PNG

 

Thank you