Created 03-25-2018 04:03 PM
Hi guys.
I am facing a strange problems.
I have a 3 node kafka 0.10 cluster and for one machine one specific paritition i got always this errors, producing the LAG on the partition between the source cluster and the destination cluster is increasing.
[2018-03-25 17:56:57,398] WARN [ConsumerFetcherThread-KafkaMirror_hdp-dw-1-nn-1.domain.local-1521821432752-5248d2cb-0-1001], Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@74958add (kafka.consumer.ConsumerFetcherThread) java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:122) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:114) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:99) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:148) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:148) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:148) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:147) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:147) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:147) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:146) at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:111) at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:30) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) {metadata.broker.list=dcluster-3.pro2.domain.local:6667,dcluster-1.pro2.domain.local:6667,dcluster-2.pro2.domain.local:6667, request.timeout.ms=30000, client.id=KafkaMirror-0, security.protocol=PLAINTEXT} [2018-03-25 17:57:28,334] WARN [ConsumerFetcherThread-KafkaMirror_hdp-dw-1-nn-1.domain.local-1521821432752-5248d2cb-0-1001], Error in fetch kafka.consumer.ConsumerFetcherThread$FetchRequest@4acd8894 (kafka.consumer.ConsumerFetcherThread) java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:122) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:114) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:99) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:148) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:148) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:148) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:147) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:147) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:147) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:146) at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:111) at kafka.consumer.ConsumerFetcherThread.fetch(ConsumerFetcherThread.scala:30) at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) {metadata.broker.list=dcluster-3.pro2.domain.local:6667,dcluster-1.pro2.domain.local:6667,dcluster-2.pro2.domain.local:6667, request.timeout.ms=30000, client.id=KafkaMirror-0, security.protocol=PLAINTEXT}
The other nodes on the cluster works perfect, and the faulty node on another topics, works perfect. Only partition 0 from topic ¨transactions¨
Created 03-26-2018 09:54 AM
I answer myself.
It seems because of the 2 network bond (one for access and another for data), if you configure on the hosts files some hosts to avoid dns resolution, it doesnt work on the long run and throw you that error, which is strange.
I removed the source kafka cluster from the host files and now it works smooth.