Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Integration of Apache Spark and Apache Kafka using PySpark

Integration of Apache Spark and Apache Kafka using PySpark

Explorer

Hello,

I want to send messages from kafka to Spark and then use Spack SQL for from manupulation. Finally i want to send it to another Kafka topic.

When i use 

kvs = KafkaUtils.createStream(ssc,"localhost:2181", "spark-streaming-consumer", {topic:1})
it creates TransformedDStream. Then i am not able to convert it to Dataframes so can use SparkSQL in it.
Later when i refered https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html and tried the following to get a dataframe
df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","localhost:9092").option("subscribe", "SparkPublish").load()

Even after this i am getting different errors.

Can anyone tell me how can receive a message from kafka using Spark streaming as Dataframes and use SparkSQl on it?

Thank You.

Don't have an account?
Coming from Hortonworks? Activate your account here