Reply
New Contributor
Posts: 5
Registered: ‎05-04-2016

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

But I want to use the provided example "streaming.KafkaWorkCount" which should not have any issues, wright?
Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

Ah sorry we are back to run-example. It should work. I just tried it on my CDH 5.7 cluster with Kafka 0.9 and it works as expected. Are you sure you don't have old libs, parcels, config left around?

New Contributor
Posts: 1
Registered: ‎05-27-2016

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

Does this  imply we have to upgrade Kafka broker version to 0.9.0.0 as well as Kafka dependency on the Spark side to use 0.9.0.0 ? 

 

We are directly using kafka through Spark streaming and running into following exception on switching to CDH 5.7.0. I updated the pom of my spark app to use 0.9.0.0 but that did not help.

 

We were running Spark 1.6.0 and Kafka 0.8.2.1 and are just moving to CDH 5.7.0. Upgrading Kafka to 0.9.0.0 would be the last thing we want to do since it requires moving all the components in our system to use that version. So in other words, is it true in order to use CDH 5.7.0, Kafka brokers have to be at 0.9.0.0 ?

 

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_${scala.binary.version}</artifactId>
<version>1.6.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_${scala.binary.version}</artifactId>
<version>0.9.0.0</version>
</dependency>

 

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 200.0 failed 4 times, most recent failure: Lost task 0.3 in stage 200.0 (TID 203,..): java.nio.BufferUnderflowException
    at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:151)
    at java.nio.ByteBuffer.get(ByteBuffer.java:715)
    at kafka.api.ApiUtils$.readShortString(ApiUtils.scala:40)
    at kafka.api.TopicData$.readFrom(FetchResponse.scala:96)
    at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:170)
    at kafka.api.FetchResponse$$anonfun$4.apply(FetchResponse.scala:169)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.immutable.Range.foreach(Range.scala:141)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
    at kafka.api.FetchResponse$.readFrom(FetchResponse.scala:169)
    at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:135)
Highlighted
Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: run-example streaming.KafkaWordCount fails on CDH 5.7.0

Yes it's my understanding that to use Spark Streaming with Kafka in CDH 5.7, you will need to run Kafka 0.9 (aka the 2.x parcel from Cloudera): http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_kafka_ki.html  

 

I think the motivation was that this is required to have any hope of enabling security for Kafka, and because of 0.8/0.9 incompatibilities. 0.8 is a bit old at this stage too.