Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How does NiFi handle large volume of data (e.g 300+ TB) traffic ?

avatar
New Contributor

We want to deploy NiFi in a cluster mode in our production environment and would like to know best practices and guidelines to use NiFi in a large-scale deployment where there are multiple sources and total volume of traffic is in the range of 300 TB. Can someone guide as to what NiFi configuration, disk. I/O, memory usage would be needed to handle that kind of capacity.

3 REPLIES 3

avatar
Super Collaborator

HI Ajay,

here is a sizing guide, which seems to address exactly your questions:

https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html

Still i personally wouldn't start with 8Gb RAM per node but at least with 16GB (2 GB per core). Anyway you will have to be clear on the throughput needed (Gb/sec.), not only on the overall volume.

Regards
Harald

avatar
Contributor

Hi @Ajay Sachdev,

like @Harald Berghoff points out overall volume is less relevant, to provide a better picture we need to know estimated throughput and an estimate of what processors you want to use in the flow. As each of the operation will require mem/io

avatar
Super Mentor
@Ajay Sachdev

Both answer provided already are good. Let me explain why they are good here:

-

NiFi is a flow based programing tool. While NiFi's core itself requires very little resources (CPU and Memory) to run, every user of NiFi builds their own unique dataflow(s) on the NiFi canvas which will have their own unique resource impact/requirement.

-

Even knowing exactly which processors you will be using, how many of each, and volume/rate of data passing through each would not allow anyone to "exactly" calculate the resource footprint of your dataflow(s).

The configuration of these components (processors, connections, controller services, reporting tasks, etc.) and the core (connection swap thresholds, status history retention, etc.) will also impact resource utilization.

-

It is best to design your dataflow(s) and test the resource impact yourself. NiFi provides some processors like "GenerateFlowFile" which can help you test your flows under load volumes.

-

Thank you,

Matt