We have NiFi configured as cluster running on Azure using HDP 2.4.
We configured NCM and two regular NiFi instances. NCM and NiFi1 running on one node, NiFi2 running on another. We have a remote site running NiFi and connecting to the cluster via an RPG configuration.
The data flows fine from the remote site to the cluster on Azure. However, the SNMP tools are reporting more than expected bandwidth usage.
The RPG processor reports an average of 13.25MB in the last 5 minutes consistently. The size of the flowfiles are almost identical. However, our SNMP tools report a constant traffic of 200KB/s (~58.59 MB in last 5 minutes) to both NiFi1 and NiFi2 .ie., roughly 120MB/5 mins. That's a very high consumption for a source payload of 13.25MB/5 min. As a test we disabled the site-to-site transmission and in a few seconds the bandwidth usage went to zero. This is to rule out if some other process is not pumping data constantly.
To all NiFi gurus there, I was wondering where this overhead from 13.25 MB to 58.59MB coming from. Is this the overhead from using site-to-site protocol? Is there some configuration that can be turned on to bring down this overhead?
Thanks in advance for your help.
Are you using the raw socket site-to-site or the http site-to-site? I would expect raw to have less overhead on the wire than http although it is less flexible as far as proxying and firewall ports are concerned. The protocol also supports compression which can be turned on through the ui.
Since they say they have a NCM, they are not running the latest version of NiFi. In order to use http site-to-site, they would need to switch from Apache NiFi 0.x (HDF 1.x) versions to the newer NiFi 1.x (HDF 2.x) versions that support that new http protocol.
@Vamsi Mohan Thattikota do your FlowFiles have a lot of attributes? Site-to-Site sends not only the FlowFile content but the attributes as well. So if you are sending a large number of FlowFiles with many or large attributes, that could account for quite a bit of bandwidth usage. If you right-click on the Remote Process Group and then click on Remote Ports, you can configure each port individually to indicate whether or not it should compress the data. If you turn on compression there, you will be compressing not only the content but the attributes, as well. Since the attributes are textual, you can expect quite a large compression ratio - like 80% or more, which would significantly reduce the amount of bandwidth being used.