Support Questions

Find answers, ask questions, and share your expertise

Suggestions to handle high volume streaming data in NiFi

avatar
Expert Contributor

Hi guys,

I have a use case where we need to load near real-time streaming data into HDFS; incoming data is of high volume, about 1500 messages per second; I've a NiFi dataflow where the ListenTCP processor is ingesting the streaming data, but the requirement is to check the incoming messages for the required structure; so, messages from ListenTCP go to a custom processor that does the structure checking; only messages that have the right structure move forward to MergeContent processor and onto PutHDFS; right now, the validation/check processor became a bottleneck and the backpressure from that processor is causing ListenTCP to queue messages at the source system (the one sending the messages);

Since the message validation processor is not able to handle the incoming data fast enough, I'm thinking that I write the messages from ListenTCP first to the file system and then let the validation processor get the messages from the file system and continue forward. Is this the right approach to resolve this; are there any suggestions for alternatives.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Master Guru

See above comments.

The main issue it to up your JVM memory. If you add 12-16 GB you should be awesome.

If it's a VM environment, give the node 16-32 or more cores. If that's not enough, go to multiple nodes in the cluster.

One node should scale to 10k/sec easy. How big are these files? Anything failing? Errors in the logs?

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html

View solution in original post

10 REPLIES 10

avatar
Master Guru

Bigger files are better than millions of little one.