Created 12-21-2018 09:03 PM
We want to deploy NiFi in a cluster mode in our production environment and would like to know best practices and guidelines to use NiFi in a large-scale deployment where there are multiple sources and total volume of traffic is in the range of 300 TB. Can someone guide as to what NiFi configuration, disk. I/O, memory usage would be needed to handle that kind of capacity.
Created 12-22-2018 07:58 AM
HI Ajay,
here is a sizing guide, which seems to address exactly your questions:
https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html
Still i personally wouldn't start with 8Gb RAM per node but at least with 16GB (2 GB per core). Anyway you will have to be clear on the throughput needed (Gb/sec.), not only on the overall volume.
Regards
Harald
Created 12-23-2018 05:38 AM
Hi @Ajay Sachdev,
like @Harald Berghoff points out overall volume is less relevant, to provide a better picture we need to know estimated throughput and an estimate of what processors you want to use in the flow. As each of the operation will require mem/io
Created 12-26-2018 01:37 PM
Both answer provided already are good. Let me explain why they are good here:
-
NiFi is a flow based programing tool. While NiFi's core itself requires very little resources (CPU and Memory) to run, every user of NiFi builds their own unique dataflow(s) on the NiFi canvas which will have their own unique resource impact/requirement.
-
Even knowing exactly which processors you will be using, how many of each, and volume/rate of data passing through each would not allow anyone to "exactly" calculate the resource footprint of your dataflow(s).
The configuration of these components (processors, connections, controller services, reporting tasks, etc.) and the core (connection swap thresholds, status history retention, etc.) will also impact resource utilization.
-
It is best to design your dataflow(s) and test the resource impact yourself. NiFi provides some processors like "GenerateFlowFile" which can help you test your flows under load volumes.
-
Thank you,
Matt