@John T I have recently built out an HDF environment for a Fortune 1 retail company to handle 1-2k connections per node and move an average of 1-1.5TB a day. We utilized the HandleHTTP processors as MiNiFi was not an option at project conception. If you are using the HandleHTTPRequest/Response processors, note that there is a bug which causes objects to not be released correctly causing heap utilization to climb in a linear fashion. Our workaround was to utilize the API to stop/start the HandleHTTPRequest processor when the heap reached 70%. This bug was corrected in the 1.6 release of NiFi but has not been rolled up into an HDF release since I last checked. So, handling that kind of volume will cause the same scenario in your situation. If you can use ListenHTTP (or MiNiFi as Matt suggested), you should be fine. We were utilizing external load balancers as we were running three clusters in separate data centers. The plan in the next phase is to start utilizing MiNiFi in the edge environments and point the different systems feeding data into HDF at those MiNiFi HTTP listeners. If you are running a single cluster, as Matt mentioned, that would load balance for you.
... View more