08-21-2018 02:04 AM
in our actual architecture, consisting of CDH 5.8.4 with Spark 1.6, Kafka 0.9 and Scala 2.10, we have Spark Streaming applications working with Kafka, which use the following libraries (artifacts):
Recently, we wanted to migrate our architecture to CDH 5.14.2 (with Spark 1.6, Kafka 1.0.1 and Scala 2.10).
We are having problems migrating our Streaming applications, especially if we want to use the following corresponding libraries (artifacts):
kafka_2.10 ( ?? )
As you can see, we did not find a corresponding artifact for the kafka_2.10 library; Do you have any feedback about that? maybe is it necessary to upgrade also the Scala version (to 2.11), the Spark Version or both?
Someone already had these migration problems with Spark 1.6 and Kafka 1.0.1 and can share a working configuration for CDH 5.14.2?
Thank you in advance
08-29-2018 09:23 PM
Spark Streaming in CDH 5.14 uses Apache 0.10.2 based kafka clients, that is Cloudera Kafka 2.2.0, you can find the spark gateway setup the classpath through /etc/spark/conf/classpath.txt :
[root@host-514 ~]# hadoop version Hadoop 2.6.0-cdh5.14.2 Subversion http://github.com/cloudera/hadoop -r 5724a4ad7a27f7af31aa725694d3df09a68bb213 Compiled by jenkins on 2018-03-27T20:40Z Compiled with protoc 2.5.0 From source with checksum 302899e86485742c090f626a828b28 This command was run using /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hadoop-common-2.6.0-cdh5.14.2.jar [root@host-514 ~]# cat /etc/spark/conf/classpath.txt |grep kafka |grep -v flume /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/kafka-clients-0.10.2-kafka-2.2.0.jar /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/kafka_2.10-0.10.2-kafka-2.2.0.jar