Support Questions

Find answers, ask questions, and share your expertise

Nifi throughput calculation for minimun hardware

avatar
Contributor

Hi,

I have gone through multiple community support link, where it is mainly saying  to start cluster, say, 50MB/s And 1000 events/sec sustained throughput for average Flow is:

  • 3 nodes each with:
    • CPU: 8+ cores (16 is preferred)
    • Memory: 8+ GB
    • Disk: 6 disks, each 1TB disks (could be spinning or SSD)

Referred link: https://community.cloudera.com/t5/Community-Articles/NiFi-Sizing-Guide-Deployment-Best-Practices/ta-...

My Nifi (version 1.23.2) cluster maintaining below hardware configuration, where Disk is a concern in my environment. Only one disk and all the repository is using the same default Nifi configuration. What will be expected throughput in these case ?

  • 3 nodes each with:
    • CPU: 8+ cores (16 is preferred)
    • Memory: 8+ GB
    • Disk: 1 disk of 500GB (could be spinning or SSD)

 

  • 5 nodes each with:
    • CPU: 8+ cores (16 is preferred)
    • Memory: 8+ GB
    • Disk: 1 disk of 500GB (could be spinning or SSD)

 

  • 7 nodes each with:
    • CPU: 8+ cores (16 is preferred)
    • Memory: 8+ GB
    • Disk: 1 disk of 500GB (could be spinning or SSD)
4 REPLIES 4

avatar
Contributor

Hi, Can anyone please update this query?

avatar

@PriyankaMondal first of all, I would personally not recommend you using single disk. All NiFi's repositories should be saved on a different disks especially in case of failures. Next, the size of your HDD/SSD should be configured based on your use case - working with lots of data require more space than working with fewer data (in terms of number of files not size of files). 


Secondly, the cluster configuration is not something you can read a forum and think that you can apply 100% in your case. When configuring a NiFi cluster you need to take several aspects into consideration, mostly related to what you are planning to do with your cluster. For example, if you are going to use NiFi for lots of streaming of data, it is recommend to go with SSDs instead of simple HDDs. If you are going to work mostly with batch and process/modify the information you are receiving (for example adding new columns based on other columns, aggregating data, using lookup, etc), you would need higher RAM and CPU, comparing to what you would need in case of a streaming cluster. 

Lastly, the number of nodes is again in correlation to the scope of the cluster. Using it for batching and working on larger chunks of data (for example extracting data from a DB and inserting in into AWS) does not require as many nodes as a streaming cluster which reads a constant amount of data from Kafka for example. 

My suggestion would be to first identify what use cases you are going to implement on your NiFi cluster, understand the data throughput and build your cluster based on those findings. And for safety reasons, use multiple HDDs/SDDs for NiFi's repositories and don't use the default configuration, with them saved into your conf folder 🙂

avatar
Master Mentor

@PriyankaMondal 

Very simply... What @cotopaul responded with.

One of the biggest definers of performance is your dataflow design itself.  Apache NiFi offers so many pluggable components for building out your dataflows and not all will perform the same.   While NiFi makes it easy to create dataflows, building the perfect highest performing dataflows can take some trial and error to get there.  I'd always recommend testing and modeling to understand the performance characteristics of the dataflow you built.  Identify and adjust where you see your bottlenecks.  Try different designs using different processors when possible.  Work with records instead of many small individual FlowFiles when possible for better performance.

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt



avatar
Community Manager

@PriyankaMondal, Did any of the responses assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: