Support Questions
Find answers, ask questions, and share your expertise

NiFi - How to use the GetSFTP processor without repeating files

Contributor

Is there a way I can configure nifi, not to pull same files over? It appears to pull the same files more than once if some files in the directory were modified.

I.e. could you explain the full algorithm behavior on how GetSFTP tracks which files it has downloaded?

And how does this behave in case NiFi process or the server restarts?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: NiFi - How to use the GetSFTP processor without repeating files

Wes,

The next version of NiFi will have new components like FetchSFTP and ListSFTP [1]:

  1. FetchSFTP will take an info from an incoming message and download the requested file.
  2. ListSFTP will list the contents of a remote dir and send flow files down the pipe to process/download next. I think it will be able to use a distributed cache as well to maintain state on the NiFi state and avoid re-processing (e.g. we don't always have an option of deleting on a remote server).

I have already used the FetchSFTP to tell it which files to get, triggered by an external notification.

[1] https://issues.apache.org/jira/browse/NIFI-673

View solution in original post

5 REPLIES 5

Re: NiFi - How to use the GetSFTP processor without repeating files

There is the "Delete Original" property which is set to 'true' by default. Is that what you are looking for? With that in mind I can't understand what do you mean about files been modified as one would expect them to be gone, can you clarify?

Re: NiFi - How to use the GetSFTP processor without repeating files

Wes,

The next version of NiFi will have new components like FetchSFTP and ListSFTP [1]:

  1. FetchSFTP will take an info from an incoming message and download the requested file.
  2. ListSFTP will list the contents of a remote dir and send flow files down the pipe to process/download next. I think it will be able to use a distributed cache as well to maintain state on the NiFi state and avoid re-processing (e.g. we don't always have an option of deleting on a remote server).

I have already used the FetchSFTP to tell it which files to get, triggered by an external notification.

[1] https://issues.apache.org/jira/browse/NIFI-673

View solution in original post

Re: NiFi - How to use the GetSFTP processor without repeating files

New Contributor

Hi Andrew, Can you please share the template with us? Not sure how to configure FetchSFTP to read output from ListSFTP. It keeps asking for the remote file location Thank you !

Re: NiFi - How to use the GetSFTP processor without repeating files

list-and-fetch-sftp-templatexml.zipTry the attached template. Also check out the repo our team is populating here: https://community.hortonworks.com/repos/6119/apache-nifi-template-rpo.html

Re: NiFi - How to use the GetSFTP processor without repeating files

New Contributor

Hi,

Is there any way where we can trigger ListSFTP based on some status condition. Say ListSFTP should pick or transfer files based on some status. for example if status is start it should start fetching files and if it something else then it should stop.