Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Sizing, Benchmark Conditions and Number of Source that can be served

avatar
Super Collaborator

Is there any document that talks about the number of input sources that a NiFi cluster can connect to? I believe there is no such thing as "number of source systems". I have seen the sizing chart here by @Matt Clarke.

Is it a fair assumption that - maximum 2 source systems can connect to a 2 node cluster under the condition that each source streams 500 message at 25 MB/s ?

Also, is there any supporting document for the data flows, processors and workload under which the sizing chart was created?

1 ACCEPTED SOLUTION

avatar
Master Guru
For IO

The throughput or latency one can expect to see varies greatly, depending on how the system is configured. Given that there are pluggable approaches to most of the major NiFi subsystems, performance depends on the implementation. But, for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50 MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100 MB per second or more of throughput. That is because linear growth is expected for each physical partition and content repository added to NiFi. This will bottleneck at some point on the FlowFile repository and provenance repository. We plan to provide a benchmarking and performance test template to include in the build, which allows users to easily test their system and to identify where bottlenecks are, and at which point they might become a factor. This template should also make it easy for system administrators to make changes and to verify the impact.

For CPU

The Flow Controller acts as the engine dictating when a particular processor is given a thread to execute. Processors are written to return the thread as soon as they are done executing a task. The Flow Controller can be given a configuration value indicating available threads for the various thread pools it maintains. The ideal number of threads to use depends on the host system resources in terms of numbers of cores, whether that system is running other services as well, and the nature of the processing in the flow. For typical IO-heavy flows, it is reasonable to make many dozens of threads to be available.

For RAM

NiFi lives within the JVM and is thus limited to the memory space it is afforded by the JVM. JVM garbage collection becomes a very important factor to both restricting the total practical heap size, as well as optimizing how well the application runs over time. NiFi jobs can be I/O intensive when reading the same content regularly. Configure a large enough disk to optimize performance.

See: https://community.hortonworks.com/questions/22685/capacity-planning-for-nifi-cluster.html

See: https://community.hortonworks.com/questions/4098/nifi-sizing.html

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices

https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-...

https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.htm...

See: https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.htm...

http://apache-nifi.1125220.n5.nabble.com/Nifi-Benchmark-Performance-tests-td1099.html

http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.1/bk_dataflow-overview/content/performance-exp...

View solution in original post

1 REPLY 1

avatar
Master Guru
For IO

The throughput or latency one can expect to see varies greatly, depending on how the system is configured. Given that there are pluggable approaches to most of the major NiFi subsystems, performance depends on the implementation. But, for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50 MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100 MB per second or more of throughput. That is because linear growth is expected for each physical partition and content repository added to NiFi. This will bottleneck at some point on the FlowFile repository and provenance repository. We plan to provide a benchmarking and performance test template to include in the build, which allows users to easily test their system and to identify where bottlenecks are, and at which point they might become a factor. This template should also make it easy for system administrators to make changes and to verify the impact.

For CPU

The Flow Controller acts as the engine dictating when a particular processor is given a thread to execute. Processors are written to return the thread as soon as they are done executing a task. The Flow Controller can be given a configuration value indicating available threads for the various thread pools it maintains. The ideal number of threads to use depends on the host system resources in terms of numbers of cores, whether that system is running other services as well, and the nature of the processing in the flow. For typical IO-heavy flows, it is reasonable to make many dozens of threads to be available.

For RAM

NiFi lives within the JVM and is thus limited to the memory space it is afforded by the JVM. JVM garbage collection becomes a very important factor to both restricting the total practical heap size, as well as optimizing how well the application runs over time. NiFi jobs can be I/O intensive when reading the same content regularly. Configure a large enough disk to optimize performance.

See: https://community.hortonworks.com/questions/22685/capacity-planning-for-nifi-cluster.html

See: https://community.hortonworks.com/questions/4098/nifi-sizing.html

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices

https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-...

https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.htm...

See: https://community.hortonworks.com/content/kbentry/9785/nifihdf-dataflow-optimization-part-2-of-2.htm...

http://apache-nifi.1125220.n5.nabble.com/Nifi-Benchmark-Performance-tests-td1099.html

http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.1/bk_dataflow-overview/content/performance-exp...