09-20-2017 05:55 AM
Hi I am trying to run a structured streaming on CDH 5.11.1 with Kafka 2.2.0-188.8.131.52.p0.68.
The streaming fails that there is no subscribe method with this definition:
Found out that this is due the change between 0.9 and 0.10 version.
My project contains a sbt pointing to a 0.10 artifacts,
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0"
libraryDependencies += "org.apache.spark" % "spark-sql-kafka-0-10_2.11" % "2.2.0"
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "0.10.2.1"
the CDH should use also Kafka 0.10. The spark-submit job has an explicit parameter:
spark-submit ... --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.kafka:kafka-clients:0.10.2.1
so I dont understand where the 0.9 comes from, and why it expects the older API.
17/09/20 14:48:07 ERROR streaming.StreamExecution: Query [id = 60109f59-c6fc-4956-b559-16b41570e45a, runId = 8c86c4a5-f510-4269-a33e-fb19390cb054] terminated with error
11-25-2017 01:02 AM
Hi I had the same problem, but I've copied solution from another problem, where I was setting kerberized Kafka with Sentry. To run Spark programm with kerberized kafka you have to switch spark tu use Kafka 0.10. You can do this in 2 ways:
1. Before spark2-submit you have to export kafka version
$ export SPARK_KAFKA_VERSION=0.10
$ spark2-submit ...
2. Set default Kafka version in spark2 service configuration in Cloudera Manager
I've also run structured streaming ETL and had the same problem as you with version 0.9. After I've set export my programm started properly.