Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Using kafka 0.10 with CDH 5.9 Spark 2.0 cloudera beta

New Contributor

Hi,

 

Need to use kafka 0.10 for various reasons with CDH 5.9

 

Now CDH 5.9 comes bundled with kafka 0.9 and has the kafka 0.9 jar in SPARK_HOME.

Having the kafka 0.10 jar in the spark fat (uber) jar does not automatically override the one in SPARK_HOME resulting in the following exception:

 

17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka version : 0.9.0-kafka-2.0.0

17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka commitId : unknown

Exception in thread "streaming-start" java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V

 

One way to do this to move the kafka 0.9 jars out of SPARK_HOME. Is there any other way to do this?

I tried using spark.executor.userClassPathFirst but it gives me a different error.

 

 

 

 

4 REPLIES 4

Expert Contributor

Spark 2 adds support for Kafka 0.10, would it be possible to use Spark 2?  http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

 

Kafka client has a significant change in .10, so replacing client jars will be difficult.  Using Spark 2 will be the easiest path.

New Contributor

We are using Spark 2 on CDH 5.10. It was installed using the instructions at http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

But this version pulls in the 0.9 version of kafka jars. And we need kafka 0.10

 

 

Expert Contributor

Yes, you're right, Cloudera does package the 0.9 version of Apache Kafka client libraries with Spark 2.  Ky afka 0.10 brokers are supposed to be able to support the 0.9 client, is there a specific error you are receiving or are you just looking for some of the new features added to the 0.10 client?

 

Unfortontely if you need to 0.10 client version, you will need to replace the 0.9 client jar with the 0.10 client jar as you described or have a seperate spark binary not managed by Cloudera Manager making sure to includ YARN and Hive configurations when launching the unmanaged binaries.

Moderator

Based on the documentation "When running jobs that require the new Kafka integration, set SPARK_KAFKA_VERSION=0.10 in the shell before launching spark-submit. Use the appropriate environment variable syntax for your shell"...

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html#running_jobs


Ferenc Erdelyi, Technical Solutions Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.