Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark and Kafka broker with SSL (or Kerberos) authentication [CDH5.7]

avatar
Contributor

Hi,

 

 

Before I go to the effort of setting up a Cloudera 5.7 cluster with Kafka, Spark and Kerberos enabled to test it out, can anyone give me the answer to the following:

 

- Does Cloudera's distribution of Spark 1.6.0 support SSL or Kerberos on a Kafka broker?

 

It looks like vanilla Spark 1.6.0 (and spark-streaming-kafka jar) builds against Kafka 0.8, while I assume CDH's Spark is built against 0.9 (as this is the version that ships with CDH Kafka 2.0.1).  It looks like vanilla Spark doesn't support SSL or Kerberos authentication to Kafka topics.

 

Many thanks

1 ACCEPTED SOLUTION

avatar
Contributor

Thanks @hubbarja.

 

Spent the afternoon trying this out on the CDH 5.7.0 QuickStart VM, with a kerberos enabled cluster and Cloudera Kafka 2.0.0.  I think perhaps I didn't quite phrase my question clearly, but what I was trying to ask was whether the spark-streaming-kafka client would support consuming from a Kafka cluster that has client SSL authentication required enabled.

 

 

For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API.  SSL, SASL_PLAINTEXT or SASL_SSL connections to Kafka all require use of the new API.

 

In fact, this issue is referenced in the known issues released with CDH 5.7.0, I just didn't spot it in time.

 

There's a pull request which appears to support SSL (but no form of Kerberos client authentication) in Github here, if anyone feels brave.

 

Looking at the comments on the Spark ticket, it's going to be at least post Spark 2.0.0 release that this feature gets merged in, and probably not until 2.1.0.

 

Back to the drawing board for me!

View solution in original post

4 REPLIES 4

avatar
Master Collaborator
Yes all that is correct regarding how the CDH build works vs upstream and Kafka version. Although I admit I have not tried it directly it is my understanding that all this is so you can use security with Kafka and Spark Streaming.

avatar
Contributor

Thanks @srowen.

 

Is the source-code to Cloudera's Spark distribution publically available somewhere, so I can take a look at how to configure it?

avatar
Expert Contributor

Apache spark source code has a mirror on github: https://github.com/apache/spark

 

Cloudera also exposes source code for each component on github with a branch for the different cdh versions: https://github.com/cloudera

CDH 5.7 Spark 1.6 source code: https://github.com/cloudera/spark/tree/cdh5-1.6.0_5.7.0

 

As for configurations, you can find the security documentation for kafka here: http://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html

and spark here: http://www.cloudera.com/documentation/enterprise/latest/topics/sg_spark_auth.html

and here: http://www.cloudera.com/documentation/enterprise/latest/topics/sg_spark_encryption.html

avatar
Contributor

Thanks @hubbarja.

 

Spent the afternoon trying this out on the CDH 5.7.0 QuickStart VM, with a kerberos enabled cluster and Cloudera Kafka 2.0.0.  I think perhaps I didn't quite phrase my question clearly, but what I was trying to ask was whether the spark-streaming-kafka client would support consuming from a Kafka cluster that has client SSL authentication required enabled.

 

 

For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API.  SSL, SASL_PLAINTEXT or SASL_SSL connections to Kafka all require use of the new API.

 

In fact, this issue is referenced in the known issues released with CDH 5.7.0, I just didn't spot it in time.

 

There's a pull request which appears to support SSL (but no form of Kerberos client authentication) in Github here, if anyone feels brave.

 

Looking at the comments on the Spark ticket, it's going to be at least post Spark 2.0.0 release that this feature gets merged in, and probably not until 2.1.0.

 

Back to the drawing board for me!