Support Questions

Find answers, ask questions, and share your expertise

NiFi Data Threshold Limit

avatar

Hi,

Is there any limit on number of data records/files NiFi can process within seconds? I need to find a solution where , I read millions of records within 10 seconds and put them in Hadoop.

Thanks,

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Shashwat Gaur

The overall throughput of NiFi is not being limited in any way at the NiFi software level.

In most cases throughput is limited by CPU, Disk I/O, Memory, and/or network performance.

I would check if any of the above are saturated. It is important that installation best practices are followed to maximize your throughput. At a minimum having the following located on separate physical disks (disks should be setup as RAIDs to protect your data) will help:

- Content repository(s)

- FlowFile repository

- Provenance repository(s)

- NiFI logging directory.

When it comes to controlling throughput in your dataflow, look for bottleneck in your dataflow and check that you have optimized your processor components for concurrent tasks and run schedules.

If your CPU is not saturated, consider increasing the number of configured threads you are allowing NiFi to hand out to its processor components in the "controller settings" (found under hamburger menu in upper right corner of NiFi UI). Change the value for "Max Timer Driven Thread Count". Good starting place is 2 - 4 times number of cores on a single NiFi instance (all settings are per node in a cluster). There is also a setting for "Max Event Driven Thread Count" which should be left unchanged. These event driven threads are experimental and not used by any NiFi components by default.

If you find a lot of Garbage Collection is going on or you are hitting OutOfMemory(heap) exceptions, you may need to increase your heap allocation in the nifi bootstrap.conf file. You may also need to make dataflow design changes to reduce the heap footprint of your flow.

Thank you,

Matt

View solution in original post

2 REPLIES 2

avatar
Contributor

Apart form JVM limitation, which you can increase, there are no definite limitation on size or number of flowfile records as such. I would say design your flow and if you feel that you're throttled, you can follow some good designing practices to tweak your flow.

Take a look at this: https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.h...

avatar
Master Mentor

@Shashwat Gaur

The overall throughput of NiFi is not being limited in any way at the NiFi software level.

In most cases throughput is limited by CPU, Disk I/O, Memory, and/or network performance.

I would check if any of the above are saturated. It is important that installation best practices are followed to maximize your throughput. At a minimum having the following located on separate physical disks (disks should be setup as RAIDs to protect your data) will help:

- Content repository(s)

- FlowFile repository

- Provenance repository(s)

- NiFI logging directory.

When it comes to controlling throughput in your dataflow, look for bottleneck in your dataflow and check that you have optimized your processor components for concurrent tasks and run schedules.

If your CPU is not saturated, consider increasing the number of configured threads you are allowing NiFi to hand out to its processor components in the "controller settings" (found under hamburger menu in upper right corner of NiFi UI). Change the value for "Max Timer Driven Thread Count". Good starting place is 2 - 4 times number of cores on a single NiFi instance (all settings are per node in a cluster). There is also a setting for "Max Event Driven Thread Count" which should be left unchanged. These event driven threads are experimental and not used by any NiFi components by default.

If you find a lot of Garbage Collection is going on or you are hitting OutOfMemory(heap) exceptions, you may need to increase your heap allocation in the nifi bootstrap.conf file. You may also need to make dataflow design changes to reduce the heap footprint of your flow.

Thank you,

Matt