Support Questions
Find answers, ask questions, and share your expertise

Real time streaming read from HTTP using Kafka and Spark Streaming

I have one use case for real time streaming, we will be using Kafka(0.9) for message buffer and spark streaming(1.6) for stream processing (HDP 2.4). We will receive ~80-90K/Sec event on public Http. Can you please suggest a recommended architecture for data ingestion into Kafka topics which will be consumed by spark streaming. Possible ways (Not sure):

1) Do we have any kind of Kafka connector (like Kafka Connect) for reading messages from http and saving into Kafka topics at scale.

2) Can we connect spark streaming to Http and store messages into Kafka topics which will be consumed by spark stream.

3) Is Flume listening to Http and sending to Kafka (Flafka )for real time streaming a good option?

Please share other possible approaches if any.


@Nilesh Pandey

This is a perfect scenario to use Apache NiFi to pick up the logs and put them into a Kafka topic. The processors to do this are already built and included with the distribution (HDF). NiFi is also very high throughput and very scalable.

@emaxwell Thanks for your response but if i dont have Nifi option what should be the optimal design/appraoch for this use case