Reply
New Contributor
Posts: 5
Registered: ‎07-27-2016

Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

CDH docs states: 

    - Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API.

 

Spark 2.0 now uses the 0.10 consumer API which resolves this issue ((https://issues.apache.org/jira/browse/SPARK-12177)) - but the current CDH version does not include Spark 2.0

 

Questions:

1. Is there a possibility to backport this pull request into existing CDH distributions - if so what is the ETA?

2. When will Spark 2.0 be included in a CDH release?

 

This is a real world problem - with clients requesting a secure Kafka with Spark Streaming.

 

Previous ticket that tested this issue without success: https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

I believe Spark 2 will be available soon for CDH 5, but I don't think there's an official announcement about it.

Actually, CDH5.7+ already _only_ works with Kafka 0.9+. I confess ignorance about the extent to which that enables security. I know upstream Spark 1.x only works with Kafka 0.8, which was too limiting in this regard.

 

New Contributor
Posts: 5
Registered: ‎07-27-2016

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

Thanks for coming back Sean...

 

Understood that CDH 5.7+ works with Kafka 0.9+ (I recall these sorts of things when looking at Oryx on CDH :-)).

 

I suppose my question was badly formed - we are interested in whether or not Cloudera would:

1. Support a custom variant of Spark 1.6.x in CDH which uses Kafka 0.9+ to enable security with Kafka and/or

2. Be willing to backport this functionality 

 

Thanks again,

Mark

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

It may be that this already works, since the existing distribution connects to 0.9. I am not familiar enough to say; I think it still uses the old consumer since upstream does.

 

I expect that if a change is needed to work, then that will come with Spark 2, and not an update to the existing 1.x distribution, because that would be a potentially big incompatible change. 

Highlighted
New Contributor
Posts: 5
Registered: ‎07-27-2016

Re: Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API

Just had a flick through a previous post on this on 24th May this year:  https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...

Kerberos-authentication-CDH5/m-p/41384#M1731

 

The final paragraph states that it does not currently work:

"For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API"

 

It sounds like the answer is that this won't be supported unitl the next CDH release (assuming this includes Spark 2.0).

 

Thanks for the insights.