- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NIFI - ListSFTP / FETCHSFTP / PUTHDFS
- Labels:
-
Apache NiFi
Created ‎10-18-2016 02:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm running cluster nifi with 4 nodes.
how I would like to setup a dataflow with sftp processors.
It is necessary to have RPG between listsftp and fetchsftp ?
Or can i simply make
listsftp (primary node) --> fetchsftp (all nodes) --> puthdfs (all nodes)
regards
Created on ‎10-18-2016 03:31 PM - edited ‎08-19-2019 01:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mayki wogno if you want to use the S2S protocol to distribute the SFTP fetches over the 4 NiFi nodes, then it will be necessary to have an RPG. ListSFTP would be configured to only run on the primary node and would connect to the RPG, which would point back to the same NiFi cluster.
You would then connect the associated input port with the process group containing the FetchSFTP and PutHDFS processors.
In a NiFi cluster, each node is processing the same dataflow (with the exception of Isolated Processors like ListSFTP only run on the primary node). Without a distribution mechanism such as the S2S protocol, there is no means to partition the file listing metadata so that each processing node fetches a distinct subset of the files on the SFTP server.
Created on ‎10-18-2016 03:31 PM - edited ‎08-19-2019 01:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mayki wogno if you want to use the S2S protocol to distribute the SFTP fetches over the 4 NiFi nodes, then it will be necessary to have an RPG. ListSFTP would be configured to only run on the primary node and would connect to the RPG, which would point back to the same NiFi cluster.
You would then connect the associated input port with the process group containing the FetchSFTP and PutHDFS processors.
In a NiFi cluster, each node is processing the same dataflow (with the exception of Isolated Processors like ListSFTP only run on the primary node). Without a distribution mechanism such as the S2S protocol, there is no means to partition the file listing metadata so that each processing node fetches a distinct subset of the files on the SFTP server.
Created ‎10-20-2016 07:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks, I'll try it and tell you it is ok.
Created ‎10-18-2016 04:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Slachterman thanks.. For RPG, if with secured cluster Nifi that URL is used?
Created ‎10-18-2016 04:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's right, the URL would specify HTTPS and the port on which NiFi is running on that host. With the new masterless architecture in HDF 2.0, the URL specified in the RPG can be any cluster node (in previous versions it had to be the NCM).
Please accept the above answer if it was helpful to you.
