Support Questions
Find answers, ask questions, and share your expertise

Limit network bandwidth-Apache NiFi

Highlighted

Limit network bandwidth-Apache NiFi

New Contributor

Can we limit network bandwidth in Apache NiFi data flows?

1 REPLY 1

Re: Limit network bandwidth-Apache NiFi

Master Guru

@vi 

The more details you provide, the more likely you are to get responses in the community.

Since i know you are dealing with GetFTP and files being consumed by that processor eating away at your limited network bandwidth, I can offer the following feedback:

I assume the ~60 GB of files consumed by your GetFTP every hour is many files?   The GetSFTP processor is deprecated in favor of the ListSFTP --> FetchSFTP processor design.  SFTP protocol is not a cluster friendly protocol for a NiFi cluster (and you should always have a NiFi cluster for redundancy and load handling).  Running the GetSFTP or ListSFTP on all nodes in the cluster would result in every node competing fo the same files. These processor would always be scheduled for "primary node" only (primary node option does not exist in a standalone NiFi setup).


The ListSFTP processor does not return the content of the listed files from the SFTP processor.  It simply generates a list of files that need to be fetched from the target SFTP server.  Each of those listed files becomes its own FlowFile in NiFi.  The ListSFTP is then connected to a FetchSFTP processor which will fetch the content for each of the FlowFiles produced by the ListSFTP.  The connection between the ListSFTP and FetchSFTP processor would be configured to load balance the FlowFiles to all nodes in your cluster. This spread out the work load of returning that content across all your cluster nodes.

While there is not configuration option in the GetSFTP or FetchSFTP processor to limit bandwidth (feel free to open an apache NiFi Jira in the community for such an improvement),  the listSFTP to FetchSFTP processor does give you some control.  You can configure the run schedule on the FetchSFTP to some value other then default 0 secs (which means run as often as possible) to some other value which would place a pause between each execution (between each FlowFile fetching its content).  While the fetch of the Content will still happen as fast as allowed, this would place a break between each fetch giving other operations time on your constrained network.

Hope this helps,
Matt