Support Questions

Find answers, ask questions, and share your expertise

Several getSFTP in parallel to different severs

avatar
Rising Star

Hi,

I'm new to Nifi, I would like to know how to download file from multiple servers in parallel (SFTP)? The number of servers can change over the time, the list of server(hostname) is store in Hive. So my second question is, how to have as the input of the getSFP the result of the hive query that may content several hostname?

I don't clear see how to do that in the documentation, anyone can help me?

Thanks in advance,

Michel

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@michelsumbul the GetSFTP processor does not allow you to specify the host dynamically. It is intended to continually poll a specific host and pull the data into NiFi, then delete the original file from the host. Often, in the Open Source world, it's important that we not delete the data from the source, and so we have moved more toward using the ListSFTP and FetchSFTP processors. The FetchSFTP processor does allow you specify the hostname and directory dynamically, so you could perform a query and then add FlowFile attributes from that query and send those FlowFiles to the FetchSFTP processor instead of GetSFTP.

View solution in original post

4 REPLIES 4

avatar
Guru

You could feed the list of servers and files in as attributes on flow files from some list source. This could be an ExecuteSQL process against HiveServer. You would split the results, extract the relevant columns as attributes. This would then be used to parameterize the settings in a FetchSFTP processor through expression language. You can then run multiple concurrent threads of the FetchSFTP processor to work the requests in parallel by changing the concurrent tasks option in the scheduling tab of the processor configuration.

avatar
Expert Contributor

@michelsumbul the GetSFTP processor does not allow you to specify the host dynamically. It is intended to continually poll a specific host and pull the data into NiFi, then delete the original file from the host. Often, in the Open Source world, it's important that we not delete the data from the source, and so we have moved more toward using the ListSFTP and FetchSFTP processors. The FetchSFTP processor does allow you specify the hostname and directory dynamically, so you could perform a query and then add FlowFile attributes from that query and send those FlowFiles to the FetchSFTP processor instead of GetSFTP.

avatar
Guru

Good point, thanks Mark, I've updated my answer to FetchSFTP, since it needs the FlowFile inputs.

avatar
Master Mentor

@Michel Sumbul has this been resolved? Can you post your solution or accept best answer?