Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this solution

Re: ListSFTP and FetchFTP duplicate files generated

Master Guru

@littlesea374 

 

The WAIT processor requires a release signal that is typically created using the NOTIFY processor.

So that really will not help here.

Perhaps you could try setting a penalty on each FlowFile.  Penalized FlowFiles are not processed by the follow on processor until the penalty duration has ended.  This can be done using the ExecuteScript processor after listSFTP:

Screen Shot 2019-11-14 at 4.45.27 PM.png

 

You then set length of "Penalty Duration" via the settings tab.  Set penalty high enough to ensure the file writes have completed.  Of course this does introduce some latency.

What this will not help with is listSFTP still listing the same files multiple times.  As data is written, the timestamp written on that source FlowFile updates, which means it will get listed again as if it is a new file.

but the delay here allows full data to be written and then perhaps you can use a detectDuplicate processor to remove duplicates based on filename before you actually fetch the content.

just some thoughts here, but that JIra is probably best path....

 

Matt

View solution in original post

Who agreed with this solution