<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Flume's Kafka Sink - Latency to reach the Queue in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48322#M51203</link>
    <description>&lt;P&gt;Hi Rafa,&lt;/P&gt;&lt;P&gt;Sorry to hear you are having trouble with performance. I suspect you are on the right track when it comes to batch sizes, but you may need some further tuning.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you start by posting the whole of your agent.conf (e.g. including sources and channels) as it's possible the latency is being introduced elsewhere. Also, what version of Flume/CDH are you running - the configuration of Kafka Sinks changed quite dramatically in Flume 1.7 (with the relevant Kafka bits also featuring in CDH5.8+).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's some performance tuning tips in&amp;nbsp;&lt;A href="http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-time-data-ingest/" target="_blank"&gt;http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-time-data-ingest/&lt;/A&gt; (although they are geared towards increasing throughput rather than decreasing latecy, there will be some relevant settings in there).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As a bit of simple maths: if you are expecting 1-2 messages per second, with a batch size of 10, it could be waiting 5-10 seconds before a batch is received and therefore before sending on. In this instance I'd be looking to tune the batch sizes down to 1 across the board in order to ensure that messages are passed on as soon as they are received.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please give that a try, and post some more details about your config and we'll see if we can help.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Tristan&lt;/P&gt;</description>
    <pubDate>Mon, 05 Dec 2016 18:51:48 GMT</pubDate>
    <dc:creator>tristan</dc:creator>
    <dc:date>2016-12-05T18:51:48Z</dc:date>
    <item>
      <title>Flume's Kafka Sink - Latency to reach the Queue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48317#M51202</link>
      <description>&lt;P&gt;Good Morning Everyone!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've been trying to use the Flume's kafka sink to send some transactional information to another system that consumes the kafka queue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is not the performance of flume (That I know of), any message that is sent to flume is consumed and sent to the kafka sink, however, the message does not appear in the kafka que for the next 3 seconds. &amp;nbsp;It takes too much time for the message to be seen in the kafka queue. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think it might me a possible kafka sink configuration, buy I'm not sure.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My flume setup is like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;- Memory channel&lt;/P&gt;&lt;P&gt;- Custom source (the source pulls data from a database and send the information through the channel)&lt;/P&gt;&lt;P&gt;- Kafka Sink&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I start counting the time to reach the kafka queue, form the time the source sends the message to the channel. &amp;nbsp;This agent does not have to handle a lot of messages (Around 1-2 mesages per second) however, I'm concerned of the time it takes to reach the kafka queue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is my kafka sink configuration:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;a3.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink&lt;BR /&gt;a3.sinks.k1.brokerList = sbmdeqpc02:9092,sbmdeqpc03:9092,sbmdeqpc04:9092&lt;BR /&gt;a3.sinks.k1.topic = aud-50&lt;BR /&gt;a3.sinks.k1.batchSize = 10&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've tried to change the batchSize Configuration but doesn't seem to change the latency.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;this is the topic description for the topic/queue&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Topic:aud-50 PartitionCount:3 ReplicationFactor:1 Configs:retention.ms=86400000&lt;BR /&gt;Topic: aud-50 Partition: 0 Leader: 183 Replicas: 183 Isr: 183&lt;BR /&gt;Topic: aud-50 Partition: 1 Leader: 181 Replicas: 181 Isr: 181&lt;BR /&gt;Topic: aud-50 Partition: 2 Leader: 182 Replicas: 182 Isr: 182&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does anyone have this issue?, a kafka sink taking too long to put messages to the queue?.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help is welcome.. Thanks for your help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kind regards.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Rafa&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:50:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48317#M51202</guid>
      <dc:creator>rilarios</dc:creator>
      <dc:date>2022-09-16T10:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: Flume's Kafka Sink - Latency to reach the Queue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48322#M51203</link>
      <description>&lt;P&gt;Hi Rafa,&lt;/P&gt;&lt;P&gt;Sorry to hear you are having trouble with performance. I suspect you are on the right track when it comes to batch sizes, but you may need some further tuning.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you start by posting the whole of your agent.conf (e.g. including sources and channels) as it's possible the latency is being introduced elsewhere. Also, what version of Flume/CDH are you running - the configuration of Kafka Sinks changed quite dramatically in Flume 1.7 (with the relevant Kafka bits also featuring in CDH5.8+).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's some performance tuning tips in&amp;nbsp;&lt;A href="http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-time-data-ingest/" target="_blank"&gt;http://blog.cloudera.com/blog/2016/08/new-in-cloudera-enterprise-5-8-flafka-improvements-for-real-time-data-ingest/&lt;/A&gt; (although they are geared towards increasing throughput rather than decreasing latecy, there will be some relevant settings in there).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As a bit of simple maths: if you are expecting 1-2 messages per second, with a batch size of 10, it could be waiting 5-10 seconds before a batch is received and therefore before sending on. In this instance I'd be looking to tune the batch sizes down to 1 across the board in order to ensure that messages are passed on as soon as they are received.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please give that a try, and post some more details about your config and we'll see if we can help.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Tristan&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2016 18:51:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48322#M51203</guid>
      <dc:creator>tristan</dc:creator>
      <dc:date>2016-12-05T18:51:48Z</dc:date>
    </item>
    <item>
      <title>Re: Flume's Kafka Sink - Latency to reach the Queue</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48324#M51204</link>
      <description>Hello Tristan&lt;BR /&gt;&lt;BR /&gt;Thanks a lot for your response, as you said, the issue was on the batchSize configuration of the kafka sink. Given that i only expected a couple of messages per second, having a batchsize of 10 was not needed. Putting the batch Size equal to 1 solved the "latency" I was seeing. I guess that if the messages arrive at a rate of thousands per second, having a larger batchSize could be much more efficient. At the end I guess it was more of a problem of type PEBKAC than Flume's problem haha! &lt;span class="lia-unicode-emoji" title=":face_with_tongue:"&gt;😛&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Just to let you know, I'm using a somewhat "older" distribution (CDH 5.5), so I don't have the newer performance improvements you linked me, however, the problem was removed changing the batchSize configuration as I said before. We are planing to upgrade our distribution in the coming months so I hope to use the newer performance enhancements soon!.&lt;BR /&gt;&lt;BR /&gt;Again, thanks a lot for your help and have a nice day!&lt;BR /&gt;&lt;BR /&gt;Rafa&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 05 Dec 2016 20:18:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Flume-s-Kafka-Sink-Latency-to-reach-the-Queue/m-p/48324#M51204</guid>
      <dc:creator>rilarios</dc:creator>
      <dc:date>2016-12-05T20:18:16Z</dc:date>
    </item>
  </channel>
</rss>

