Support Questions

Find answers, ask questions, and share your expertise

Kafka Streams vs Structured Streaming which is better


On a highlevel i know the difference between kafka streams and structured streaming. However which is currently better to use in production?


Super Collaborator

This is dependent on what your current task is. Kafka is a message broker, while Spark (I guess you mean Spark structured streaming) is a parallel processing framework. So if your task is to do some number crunching based on the stream input, it might be worth having a closer look on Spark structured streaming. If the task is more on distributing the data you receive in a stream on multiple clients (consumer) fast and reliable, you might have a closer look on Kafka Streams.

But my personal experience is only with Kafka, but since Spark structured stream has left behind the experimental status it should be fine to use it in production.

Cloudera Employee

Kafka Streams as the name says it is bound to Kafka and it is a good tool when the input and output data is stored in Kafka and you want to perform simple operations on the stream.

Spark Structured streaming is highly scalable and can be used for Complex Event Processing (CEP) use cases. Can also do federated joins on data stored in multiple sources.