Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nifi - publishing millions of 100 bytes message to publish kafka

avatar
Expert Contributor

Hi All,

Thanks a lot to this awesome community.

My use case is like as follows

We use listentcp porcessor to listen to small firewall events max size 200 bytes. But we batch them at Listentcp level to increase network I/O and performance. Then we split them into individual flow files and do some processing and enrichment and publish them to Kafka.

However, our kafka is not able to keep up with rate of flow. We have millions of flowfiles queued up.

I read that kafka can publish millions of small message per second.

What all properties do I need o configure in publish kafka so to increas its performane.

I tried tuning max.request.size however it is the maximum size of each message so no help

I tried adding one more property buffer.memory to buffer small message together and then publish still no help

should I also add one more proerty called batch.size (controls how many bytes of data to collect before sending messages to the Kafka broker. Set this as high as possible, without exceeding available memory. The default value is 16384.) and

linger.ms (linger.ms sets the maximum time to buffer data in asynchronous mode. For example, a setting of 100 batches 100ms of messages to send at once. This improves throughput, but the buffering adds message delivery latency.)

In my opinion I should buffer for all the small messages atleast (100000 messages ) then write to kafka topic, this will increase the netwrok I/o as well less writes. I just not sure which properties will help me here.

Thanks

Dhieru

78492-39958-qkakfak.png

2 REPLIES 2

avatar
Super Collaborator

I think you've seen this blog post already but just in case you haven't:

https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka

You'll want to understand whether the bottleneck is on the Kafka side or on the NiFi side so you can understand where to appropriately tune. How many NiFi nodes do you have? How many Kafka nodes? How many partitions for your Kafka topic? The blog post above goes into detail on how to match partitions with NiFi nodes and concurrent tasks as well.

avatar
Expert Contributor

@anarasimham Thanks for the response, so if I add some custom properties such as linger.ms batch.size buffer.memory will the publish kafka processor honor these properties, will use it? I checked the source code of publish kafka processor but I did not find any mention of these properties.

Again Thanks

Dhieru