Created 01-27-2016 03:29 PM
Hi,
I'm new to Nifi, I would like to know how to download file from multiple servers in parallel (SFTP)? The number of servers can change over the time, the list of server(hostname) is store in Hive. So my second question is, how to have as the input of the getSFP the result of the hive query that may content several hostname?
I don't clear see how to do that in the documentation, anyone can help me?
Thanks in advance,
Michel
Created 01-27-2016 03:41 PM
@michelsumbul the GetSFTP processor does not allow you to specify the host dynamically. It is intended to continually poll a specific host and pull the data into NiFi, then delete the original file from the host. Often, in the Open Source world, it's important that we not delete the data from the source, and so we have moved more toward using the ListSFTP and FetchSFTP processors. The FetchSFTP processor does allow you specify the hostname and directory dynamically, so you could perform a query and then add FlowFile attributes from that query and send those FlowFiles to the FetchSFTP processor instead of GetSFTP.
Created 01-27-2016 03:39 PM
You could feed the list of servers and files in as attributes on flow files from some list source. This could be an ExecuteSQL process against HiveServer. You would split the results, extract the relevant columns as attributes. This would then be used to parameterize the settings in a FetchSFTP processor through expression language. You can then run multiple concurrent threads of the FetchSFTP processor to work the requests in parallel by changing the concurrent tasks option in the scheduling tab of the processor configuration.
Created 01-27-2016 03:41 PM
@michelsumbul the GetSFTP processor does not allow you to specify the host dynamically. It is intended to continually poll a specific host and pull the data into NiFi, then delete the original file from the host. Often, in the Open Source world, it's important that we not delete the data from the source, and so we have moved more toward using the ListSFTP and FetchSFTP processors. The FetchSFTP processor does allow you specify the hostname and directory dynamically, so you could perform a query and then add FlowFile attributes from that query and send those FlowFiles to the FetchSFTP processor instead of GetSFTP.
Created 01-27-2016 03:58 PM
Good point, thanks Mark, I've updated my answer to FetchSFTP, since it needs the FlowFile inputs.
Created 02-02-2016 03:22 PM
@Michel Sumbul has this been resolved? Can you post your solution or accept best answer?