Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume's Kafka Sink - Latency to reach the Queue

avatar
Explorer

Good Morning Everyone!

 

I've been trying to use the Flume's kafka sink to send some transactional information to another system that consumes the kafka queue.

 

The problem is not the performance of flume (That I know of), any message that is sent to flume is consumed and sent to the kafka sink, however, the message does not appear in the kafka que for the next 3 seconds.  It takes too much time for the message to be seen in the kafka queue.  

 

I think it might me a possible kafka sink configuration, buy I'm not sure.

 

My flume setup is like this:

 

- Memory channel

- Custom source (the source pulls data from a database and send the information through the channel)

- Kafka Sink

 

I start counting the time to reach the kafka queue, form the time the source sends the message to the channel.  This agent does not have to handle a lot of messages (Around 1-2 mesages per second) however, I'm concerned of the time it takes to reach the kafka queue.

 

This is my kafka sink configuration:

 

a3.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a3.sinks.k1.brokerList = sbmdeqpc02:9092,sbmdeqpc03:9092,sbmdeqpc04:9092
a3.sinks.k1.topic = aud-50
a3.sinks.k1.batchSize = 10

 

I've tried to change the batchSize Configuration but doesn't seem to change the latency.

 

this is the topic description for the topic/queue

 

Topic:aud-50 PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=86400000
Topic: aud-50 Partition: 0 Leader: 183 Replicas: 183 Isr: 183
Topic: aud-50 Partition: 1 Leader: 181 Replicas: 181 Isr: 181
Topic: aud-50 Partition: 2 Leader: 182 Replicas: 182 Isr: 182

 

Does anyone have this issue?, a kafka sink taking too long to put messages to the queue?.

 

Any help is welcome.. Thanks for your help.

 

Kind regards.

 

Rafa

 

 

 

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi Rafa,

Sorry to hear you are having trouble with performance. I suspect you are on the right track when it comes to batch sizes, but you may need some further tuning.

 

Could you start by posting the whole of your agent.conf (e.g. including sources and channels) as it's possible the latency is being introduced elsewhere. Also, what version of Flume/CDH are you running - the configuration of Kafka Sinks changed quite dramatically in Flume 1.7 (with the relevant Kafka bits also featuring in CDH5.8+).

 

There's some performance tuning tips in http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-ti... (although they are geared towards increasing throughput rather than decreasing latecy, there will be some relevant settings in there).

 

As a bit of simple maths: if you are expecting 1-2 messages per second, with a batch size of 10, it could be waiting 5-10 seconds before a batch is received and therefore before sending on. In this instance I'd be looking to tune the batch sizes down to 1 across the board in order to ensure that messages are passed on as soon as they are received.

 

Please give that a try, and post some more details about your config and we'll see if we can help.


Tristan

View solution in original post

2 REPLIES 2

avatar
New Contributor

Hi Rafa,

Sorry to hear you are having trouble with performance. I suspect you are on the right track when it comes to batch sizes, but you may need some further tuning.

 

Could you start by posting the whole of your agent.conf (e.g. including sources and channels) as it's possible the latency is being introduced elsewhere. Also, what version of Flume/CDH are you running - the configuration of Kafka Sinks changed quite dramatically in Flume 1.7 (with the relevant Kafka bits also featuring in CDH5.8+).

 

There's some performance tuning tips in http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-ti... (although they are geared towards increasing throughput rather than decreasing latecy, there will be some relevant settings in there).

 

As a bit of simple maths: if you are expecting 1-2 messages per second, with a batch size of 10, it could be waiting 5-10 seconds before a batch is received and therefore before sending on. In this instance I'd be looking to tune the batch sizes down to 1 across the board in order to ensure that messages are passed on as soon as they are received.

 

Please give that a try, and post some more details about your config and we'll see if we can help.


Tristan

avatar
Explorer
Hello Tristan

Thanks a lot for your response, as you said, the issue was on the batchSize configuration of the kafka sink. Given that i only expected a couple of messages per second, having a batchsize of 10 was not needed. Putting the batch Size equal to 1 solved the "latency" I was seeing. I guess that if the messages arrive at a rate of thousands per second, having a larger batchSize could be much more efficient. At the end I guess it was more of a problem of type PEBKAC than Flume's problem haha! 😛

Just to let you know, I'm using a somewhat "older" distribution (CDH 5.5), so I don't have the newer performance improvements you linked me, however, the problem was removed changing the batchSize configuration as I said before. We are planing to upgrade our distribution in the coming months so I hope to use the newer performance enhancements soon!.

Again, thanks a lot for your help and have a nice day!

Rafa