Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Correct NiFi sizing

Contributor

Hello,

in the various official guides and articles around the community, there are different (conflicting) suggestions regarding NiFi sizing. HDF documentations suggests that to achieve a sustained throughput of 200MB/s, 4GB of RAM per node are required. Should I consider this throughput to be per node or per cluster?

This article here, seems to suggests 16GB of RAM, for the same use case. For a similar one, 64GB of RAM are suggested here.

Let's consider a medium-difficulty topology. Which one is the correct sizing for the RAM?

I am also wondering about the number of cores, I noticed from the official documentation that scaling from "3-4 nodes" (8 cores, 100MB/s) to "5-7 nodes" (24 cores, 200MB/s), increases the minimum number of cores required 3 times.

Starting from 3 nodes and scaling up to 6 would mean an increase in costs of 6 times, why should I prefer this configuration to just 2 smaller clusters (aside from the obvious management issues)?

Thanks

1 ACCEPTED SOLUTION

Mentor
@Raffaele S

NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added.

-

Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs.

-

How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node.

-

Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time.

-

Thanks,

Matt

View solution in original post

1 REPLY 1

Mentor
@Raffaele S

NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added.

-

Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs.

-

How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node.

-

Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time.

-

Thanks,

Matt

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.