01-31-2017 09:03 AM
Hi,
Need to use kafka 0.10 for various reasons with CDH 5.9
Now CDH 5.9 comes bundled with kafka 0.9 and has the kafka 0.9 jar in SPARK_HOME.
Having the kafka 0.10 jar in the spark fat (uber) jar does not automatically override the one in SPARK_HOME resulting in the following exception:
17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka version : 0.9.0-kafka-2.0.0
17/01/31 16:32:36 INFO utils.AppInfoParser: Kafka commitId : unknown
Exception in thread "streaming-start" java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.subscribe(Ljava/util/Collection;)V
One way to do this to move the kafka 0.9 jars out of SPARK_HOME. Is there any other way to do this?
I tried using spark.executor.userClassPathFirst but it gives me a different error.
02-02-2017 06:46 AM
Spark 2 adds support for Kafka 0.10, would it be possible to use Spark 2? http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
Kafka client has a significant change in .10, so replacing client jars will be difficult. Using Spark 2 will be the easiest path.
02-06-2017 02:02 PM - edited 02-06-2017 02:03 PM
We are using Spark 2 on CDH 5.10. It was installed using the instructions at http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
But this version pulls in the 0.9 version of kafka jars. And we need kafka 0.10
02-13-2017 08:41 PM
Yes, you're right, Cloudera does package the 0.9 version of Apache Kafka client libraries with Spark 2. Ky afka 0.10 brokers are supposed to be able to support the 0.9 client, is there a specific error you are receiving or are you just looking for some of the new features added to the 0.10 client?
Unfortontely if you need to 0.10 client version, you will need to replace the 0.9 client jar with the 0.10 client jar as you described or have a seperate spark binary not managed by Cloudera Manager making sure to includ YARN and Hive configurations when launching the unmanaged binaries.
06-30-2017 05:14 AM
Based on the documentation "When running jobs that require the new Kafka integration, set SPARK_KAFKA_VERSION=0.10 in the shell before launching spark-submit. Use the appropriate environment variable syntax for your shell"...
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html#running_jobs