CDH docs states:
- Spark Streaming cannot consume from secure Kafka till it starts using Kafka 0.9 Consumer API.
Spark 2.0 now uses the 0.10 consumer API which resolves this issue ((https://issues.apache.org/jira/browse/SPARK-12177)) - but the current CDH version does not include Spark 2.0
1. Is there a possibility to backport this pull request into existing CDH distributions - if so what is the ETA?
2. When will Spark 2.0 be included in a CDH release?
This is a real world problem - with clients requesting a secure Kafka with Spark Streaming.
Previous ticket that tested this issue without success: https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...
I believe Spark 2 will be available soon for CDH 5, but I don't think there's an official announcement about it.
Actually, CDH5.7+ already _only_ works with Kafka 0.9+. I confess ignorance about the extent to which that enables security. I know upstream Spark 1.x only works with Kafka 0.8, which was too limiting in this regard.
Thanks for coming back Sean...
Understood that CDH 5.7+ works with Kafka 0.9+ (I recall these sorts of things when looking at Oryx on CDH :-)).
I suppose my question was badly formed - we are interested in whether or not Cloudera would:
1. Support a custom variant of Spark 1.6.x in CDH which uses Kafka 0.9+ to enable security with Kafka and/or
2. Be willing to backport this functionality
It may be that this already works, since the existing distribution connects to 0.9. I am not familiar enough to say; I think it still uses the old consumer since upstream does.
I expect that if a change is needed to work, then that will come with Spark 2, and not an update to the existing 1.x distribution, because that would be a potentially big incompatible change.
Just had a flick through a previous post on this on 24th May this year: https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-and-Kafka-broker-with-SSL-or...
The final paragraph states that it does not currently work:
"For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API"
It sounds like the answer is that this won't be supported unitl the next CDH release (assuming this includes Spark 2.0).
Thanks for the insights.