Created on 01-31-2017 09:03 AM - edited 09-16-2022 03:59 AM
Hi,
Need to use kafka 0.10 for various reasons with CDH 5.9
Now CDH 5.9 comes bundled with kafka 0.9 and has the kafka 0.9 jar in SPARK_HOME.
Having the kafka 0.10 jar in the spark fat (uber) jar does not automatically override the one in SPARK_HOME resulting in the following exception:
17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka version : 0.9.0-kafka-2.0.0
17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka commitId : unknown
Exception in thread "streaming-start" java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V
One way to do this to move the kafka 0.9 jars out of SPARK_HOME. Is there any other way to do this?
I tried using spark.executor.userClassPathFirst but it gives me a different error.
Created 02-02-2017 06:46 AM
Spark 2 adds support for Kafka 0.10, would it be possible to use Spark 2? http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
Kafka client has a significant change in .10, so replacing client jars will be difficult. Using Spark 2 will be the easiest path.
Created on 02-06-2017 02:02 PM - edited 02-06-2017 02:03 PM
We are using Spark 2 on CDH 5.10. It was installed using the instructions at http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
But this version pulls in the 0.9 version of kafka jars. And we need kafka 0.10
Created 02-13-2017 08:41 PM
Yes, you're right, Cloudera does package the 0.9 version of Apache Kafka client libraries with Spark 2. Ky afka 0.10 brokers are supposed to be able to support the 0.9 client, is there a specific error you are receiving or are you just looking for some of the new features added to the 0.10 client?
Unfortontely if you need to 0.10 client version, you will need to replace the 0.9 client jar with the 0.10 client jar as you described or have a seperate spark binary not managed by Cloudera Manager making sure to includ YARN and Hive configurations when launching the unmanaged binaries.
Created 06-30-2017 05:14 AM
Based on the documentation "When running jobs that require the new Kafka integration, set SPARK_KAFKA_VERSION=0.10 in the shell before launching spark-submit. Use the appropriate environment variable syntax for your shell"...
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html#running_jobs
Ferenc Erdelyi, Technical Solutions Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: