Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎08-21-2018

Migration from CDH 5.8.4 (Spark 1.6 - Kafka 0.9) to CDH 5.14.2 (Spark 1.6 - Kafka 1.0.1)

Hi,

 

in our actual architecture, consisting of CDH 5.8.4 with Spark 1.6, Kafka 0.9 and Scala 2.10, we have Spark Streaming applications working with Kafka, which use the following libraries (artifacts):
    
    spark-core_2.10 (1.6.0-cdh5.8.4)
    spark-streaming_2.10 (1.6.0-cdh5.8.4)
    spark-streaming-kafka_2.10 (1.6.0-cdh5.8.4)
    kafka_2.10 (0.9.0-cdh5.8.4)
    hbase-spark (1.2.0-cdh5.8.4)

 

Recently, we wanted to migrate our architecture to CDH 5.14.2 (with Spark 1.6, Kafka 1.0.1 and Scala 2.10).
We are having problems migrating our Streaming applications, especially if we want to use the following corresponding libraries (artifacts):

 

    spark-core_2.10 (1.6.0-cdh5.14.2)
    spark-streaming_2.10 (1.6.0-cdh5.14.2)
    spark-streaming-kafka_2.10 (1.6.0-cdh5.14.2)
    kafka_2.10 ( ?? )
    hbase-spark (1.2.0-cdh5.14.2)

 

As you can see, we did not find a corresponding artifact for the kafka_2.10 library; Do you have any feedback about that? maybe is it necessary to upgrade also the Scala version (to 2.11), the Spark Version or both?

 

Someone already had these migration problems with Spark 1.6 and Kafka 1.0.1 and can share a working configuration for CDH 5.14.2?

 


Thank you in advance

Cloudera Employee
Posts: 56
Registered: ‎03-01-2016

Re: Migration from CDH 5.8.4 (Spark 1.6 - Kafka 0.9) to CDH 5.14.2 (Spark 1.6 - Kafka 1.0.1)

Spark Streaming in CDH 5.14 uses Apache 0.10.2 based kafka clients, that is Cloudera Kafka 2.2.0, you can find the spark gateway setup the classpath through /etc/spark/conf/classpath.txt :

 

 

[root@host-514 ~]# hadoop version
Hadoop 2.6.0-cdh5.14.2
Subversion http://github.com/cloudera/hadoop -r 5724a4ad7a27f7af31aa725694d3df09a68bb213
Compiled by jenkins on 2018-03-27T20:40Z
Compiled with protoc 2.5.0
From source with checksum 302899e86485742c090f626a828b28
This command was run using /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hadoop-common-2.6.0-cdh5.14.2.jar
[root@host-514 ~]# cat /etc/spark/conf/classpath.txt |grep kafka |grep -v flume
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/kafka-clients-0.10.2-kafka-2.2.0.jar
/opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/kafka_2.10-0.10.2-kafka-2.2.0.jar

 

 

Announcements
New solutions