I have one use case for real time streaming, we will be using Kafka(0.9) for message buffer and spark streaming(1.6) for stream processing (HDP 2.4). We will receive ~80-90K/Sec event on public Http. Can you please suggest a recommended architecture for data ingestion into Kafka topics which will be consumed by spark streaming. Possible ways (Not sure):
1) Do we have any kind of Kafka connector (like Kafka Connect) for reading messages from http and saving into Kafka topics at scale.
2) Can we connect spark streaming to Http and store messages into Kafka topics which will be consumed by spark stream.
3) Is Flume listening to Http and sending to Kafka (Flafka )for real time streaming a good option?
Please share other possible approaches if any.
This is a perfect scenario to use Apache NiFi to pick up the logs and put them into a Kafka topic. The processors to do this are already built and included with the distribution (HDF). NiFi is also very high throughput and very scalable.