Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why doesn't ListSFTP allow upstream connections?

Solved Go to solution

Why doesn't ListSFTP allow upstream connections?

Hello,

I'm working on a flow that currently uses FetchSFTP to download a file from an sFTP server every 15 minutes or so. It works, however, if a file is not there (which is valid in our use-case) then the FetchSFTP throws an error. Because of this error, our log monitors think there is an actual error, but in reality it's not an error, it is just how our flow is set up. So to avoid this error from FetchSFTP, I want to use a ListSFTP to check for a file first. However, ListSFTP does not allow upstream connections so I cannot pass the sFTP server information to it. We load this server information from a GetFile upstream and then use an UpdateAttribute to add the information to the flowfile. I don't understand the reasoning for not allowing upstream connections to the ListSFTP processor. Can someone explain why it's designed like this? Also, do you know of another workaround to stop FetchSFTP from throwing an error when it doesn't find a file (since it's just going to be looking again in 15 minutes or so).

Thanks for the help.

Also, since a picture is worth a thousand words, here is our current flow with the FetchSFTP getting information from the client config file:

60524-flow.png

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Why doesn't ListSFTP allow upstream connections?

Expert Contributor
@Benjamin Newell

So, the idea of ListSFTP is to provide list of files from SFTP based on filters, etc. This processor is stateful, meaning it will give you list of files that have been modified since last run. And it will maintain a state. That's a reason for not allowing incoming connections.

Option 1.

FetchSFTP has connection "not.found". You can use it to postpone processing of this file (add processor to penalize it and loop it back to FetchSFTP).

And configure your log monitor to ignore all the messages matching "Failed to fetch content for {} from filename {} on remote host {} because the file could not be found on the remote system; routing to not.found"

Option 2.

First Flow: Use ListSFTP to pull files and put them on landing zone (local or NAS).

Second Flow: In your flow - replace FetchSFTP with FetchFile. FetchFile has property "Log level when file not found". Use "DEBUG", so it won't be printing errors into log file.

Option 3.

Create custom processor extending FetchSFTP and change onTrigger to NOT print error message in case file not found exception.

Let me know if that helps.

View solution in original post

4 REPLIES 4
Highlighted

Re: Why doesn't ListSFTP allow upstream connections?

Bump. Does anyone have any ideas surrounding this? I feel like it's a poor design choice to not allow ListSFTP to have upstream connections. It doesn't make sense because almost every field in ListSFTP allows expression language. Why would NiFi not allow us to pass the server information to ListSFTP when it has the capability of expression language? So it seems the only way to pass server information to this processor is via the variable registry in 'nifi.properties'?

Highlighted

Re: Why doesn't ListSFTP allow upstream connections?

Expert Contributor
@Benjamin Newell

So, the idea of ListSFTP is to provide list of files from SFTP based on filters, etc. This processor is stateful, meaning it will give you list of files that have been modified since last run. And it will maintain a state. That's a reason for not allowing incoming connections.

Option 1.

FetchSFTP has connection "not.found". You can use it to postpone processing of this file (add processor to penalize it and loop it back to FetchSFTP).

And configure your log monitor to ignore all the messages matching "Failed to fetch content for {} from filename {} on remote host {} because the file could not be found on the remote system; routing to not.found"

Option 2.

First Flow: Use ListSFTP to pull files and put them on landing zone (local or NAS).

Second Flow: In your flow - replace FetchSFTP with FetchFile. FetchFile has property "Log level when file not found". Use "DEBUG", so it won't be printing errors into log file.

Option 3.

Create custom processor extending FetchSFTP and change onTrigger to NOT print error message in case file not found exception.

Let me know if that helps.

View solution in original post

Highlighted

Re: Why doesn't ListSFTP allow upstream connections?

Thank you, Ed, for the insight on the ListSFTP processor and for providing some possible workarounds. I think this will give me enough to work with to come up with a solution.

Highlighted

Re: Why doesn't ListSFTP allow upstream connections?

I agree it would be nice to allow incoming upstream connections to ListSFTP. However in answer to the previous comment you can also set variables in a Process Group by right clicking on the canvas and selecting the variable option. It is dynamic and much better for my needs than setting in 'nifi.properties'

Don't have an account?
Coming from Hortonworks? Activate your account here