Support Questions

rsg · ‎02-14-2018

Hello,

in the various official guides and articles around the community, there are different (conflicting) suggestions regarding NiFi sizing. HDF documentations suggests that to achieve a sustained throughput of 200MB/s, 4GB of RAM per node are required. Should I consider this throughput to be per node or per cluster?

This article here, seems to suggests 16GB of RAM, for the same use case. For a similar one, 64GB of RAM are suggested here.

Let's consider a medium-difficulty topology. Which one is the correct sizing for the RAM?

I am also wondering about the number of cores, I noticed from the official documentation that scaling from "3-4 nodes" (8 cores, 100MB/s) to "5-7 nodes" (24 cores, 200MB/s), increases the minimum number of cores required 3 times.

Starting from 3 nodes and scaling up to 6 would mean an increase in costs of 6 times, why should I prefer this configuration to just 2 smaller clusters (aside from the obvious management issues)?

Thanks

MattWho · ‎05-07-2018

@Raffaele S

NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added.

-

Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs.

-

How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node.

-

Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time.

-

Thanks,

Matt

View solution in original post

MattWho · ‎05-07-2018

@Raffaele S

NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added.

-

Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs.

-

How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node.

-

Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time.

-

Thanks,

Matt

Cloudera Community

Support Questions

Correct NiFi sizing