Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

run-example streaming.KafkaWordCount fails on CDH 5.7.0

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

Master Collaborator

Ah sorry we are back to run-example. It should work. I just tried it on my CDH 5.7 cluster with Kafka 0.9 and it works as expected. Are you sure you don't have old libs, parcels, config left around?

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

New Contributor

Does this  imply we have to upgrade Kafka broker version to 0.9.0.0 as well as Kafka dependency on the Spark side to use 0.9.0.0 ? 

 

We are directly using kafka through Spark streaming and running into following exception on switching to CDH 5.7.0. I updated the pom of my spark app to use 0.9.0.0 but that did not help.

 

We were running Spark 1.6.0 and Kafka 0.8.2.1 and are just moving to CDH 5.7.0. Upgrading Kafka to 0.9.0.0 would be the last thing we want to do since it requires moving all the components in our system to use that version. So in other words, is it true in order to use CDH 5.7.0, Kafka brokers have to be at 0.9.0.0 ?

 

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_${scala.binary.version}</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_${scala.binary.version}</artifactId>
<version>0.9.0.0</version>
</dependency>

 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 200.0 failed 4 times, most recent failure: Lost task 0.3 in stage 200.0 (TID 203,..): java.nio.BufferUnderflowException
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
    at java.nio.ByteBuffer.get(ByteBuffer.java:715)
    at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:40)
    at kafka.api.TopicData$.readFrom(FetchResponse.scala:96)
    at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)
    at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.immutable.Range.foreach(Range.scala:141)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
    at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)
    at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)
Highlighted

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

Master Collaborator

Yes it's my understanding that to use Spark Streaming with Kafka in CDH 5.7, you will need to run Kafka 0.9 (aka the 2.x parcel from Cloudera): http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_kafka_ki.html  

 

I think the motivation was that this is required to have any hope of enabling security for Kafka, and because of 0.8/0.9 incompatibilities. 0.8 is a bit old at this stage too. 

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

New Contributor

I have exactly same problem.

 

Is there anyone can run kafka consumer in CDH 5.7.0?