Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

NiFi ListenTCP handle data spikes

Contributor

Hi All,

Thanks a lot this awesome community. In the data flow we have

listenTCP -> mergecontent -> updateattribute -> puthdfs

When we have data spike like show in the attached picture, all the processors are overwhelmed. I looked at concurrent tasks, increased them however still in does not fix the problem,

is there anything we can do to absorb data spikes.capture2.png

Thanks

Dhieru

1 REPLY 1

Mentor

@dhieru singh

When you say "all processors" are being overwhelmed, are you saying connection between every single processor is filling and triggering back pressure in your dataflow?

Have you looked that the resource of your hardware running your NiFi instance?

Is CPU becoming, memory, and/or disk I/O becoming saturated during these spikes? If so, there is not much with in the configuration of NiFi that can help much here. In a case like this it would require that you expand your NiFi into a cluster.

You then have two options for your ListenTCP feed.

1. Run the ListenTCP processor on all nodes and place an external load-balancer to distribute the TCP traffic to every node.

2. Have the ListenTCP processor receive data on only one node, but immediately feed the success relationship form that ListenTCP processor to a Remote Process Group (RPG) that can be used to redistribute the received FlowFiles to all nodes in your cluster to spread out the work being done by the rest of the processors in your dataflow(s)

If your resources are not saturated, make sure you have allocated enough "Max Timer Driven Threads" to your NiFi instance so that all processors are fully utilizing those server CPU resources. Defaults for NiFi are only 10. The Max Timer Driven Thread count can be adjusted in the "Controller settings" UI found within the hamburger menu in the upper right corner.

Note: do not adjust defaults for Event Driven Thread Count. This just increase a thread pool that i not used by default.

If disk I/O is high, following best practices to make sure the NiFi logs, Provenance repository(s), Content repository(s), and FlowFile repository are all located on their own physical disks would help here.

Thank you,

Matt

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.