Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

Highlighted

Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

New Contributor

CDH docs states: 

    - Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API.

 

Spark 2.0 now uses the 0.10 consumer API which resolves this issue ((https://issues.apache.org/jira/browse/SPARK-12177)) - but the current CDH version does not include Spark 2.0

 

Questions:

1. Is there a possibility to backport this pull request into existing CDH distributions - if so what is the ETA?

2. When will Spark 2.0 be included in a CDH release?

 

This is a real world problem - with clients requesting a secure Kafka with Spark Streaming.

 

Previous ticket that tested this issue without success: https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...

4 REPLIES 4

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

Master Collaborator

I believe Spark 2 will be available soon for CDH 5, but I don't think there's an official announcement about it.

Actually, CDH5.7+ already _only_ works with Kafka 0.9+. I confess ignorance about the extent to which that enables security. I know upstream Spark 1.x only works with Kafka 0.8, which was too limiting in this regard.

 

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

New Contributor

Thanks for coming back Sean...

 

Understood that CDH 5.7+ works with Kafka 0.9+ (I recall these sorts of things when looking at Oryx on CDH :-)).

 

I suppose my question was badly formed - we are interested in whether or not Cloudera would:

1. Support a custom variant of Spark 1.6.x in CDH which uses Kafka 0.9+ to enable security with Kafka and/or

2. Be willing to backport this functionality 

 

Thanks again,

Mark

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

Master Collaborator

It may be that this already works, since the existing distribution connects to 0.9. I am not familiar enough to say; I think it still uses the old consumer since upstream does.

 

I expect that if a change is needed to work, then that will come with Spark 2, and not an update to the existing 1.x distribution, because that would be a potentially big incompatible change. 

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

New Contributor

Just had a flick through a previous post on this on 24th May this year:  https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...

Kerberos-authentication-CDH5/m-p/41384#M1731

 

The final paragraph states that it does not currently work:

"For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API"

 

It sounds like the answer is that this won't be supported unitl the next CDH release (assuming this includes Spark 2.0).

 

Thanks for the insights.

Don't have an account?
Coming from Hortonworks? Activate your account here