Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can Kafka process multiple files?

Solved Go to solution

Can Kafka process multiple files?

Contributor

Can kafka process multiple files and then send it to spark streaming?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Can Kafka process multiple files?

@mel mendoza

Kafka is a message broker so it only receives files/events from publishers and makes them available for consumption by consumers. It does not do any processing.

Spark streaming would dictate how files/events are read. Since Spark Streaming does micro-batching it will read several files/events from Kafka and process them together in a micro-batch.

I believe this will achieve what you are asking to do, it'll be on the Spark side though, not Kafka.

As always, if you find this post helpful, don't forget to "accept" answer.

6 REPLIES 6

Re: Can Kafka process multiple files?

@mel mendoza

Kafka is a message broker so it only receives files/events from publishers and makes them available for consumption by consumers. It does not do any processing.

Spark streaming would dictate how files/events are read. Since Spark Streaming does micro-batching it will read several files/events from Kafka and process them together in a micro-batch.

I believe this will achieve what you are asking to do, it'll be on the Spark side though, not Kafka.

As always, if you find this post helpful, don't forget to "accept" answer.

Re: Can Kafka process multiple files?

Contributor

@Eyad Garelnabi

Meaning, I should go straight to Spark to process multiple files.

Re: Can Kafka process multiple files?

Guru

Hello @mel mendoza ,

Kafka is basically not a file based systems, but event based. If you want to process files with Spark-Streaming via Kafka you have a 2-step approach. First is ingest to Kafka, then consume the events from Kafka by Spark-Streaming.

To ingest into Kafka you can e.g. use Kafka-Connect with the file source (check /usr/hdp/current/kafka-broker/conf/connect-file-source.properties). It works like a "tail -f " on that file and streams any incoming data from that file to the Kafka topic.

Afterwards you have to consume the events from that Kafka topic with your Spark-Streaming job.

HTH, Gerd

Re: Can Kafka process multiple files?

Contributor

Thanks @Gerd Koenig !

For multiple files processing what application/tech should you recommend, process in realtime?

Highlighted

Re: Can Kafka process multiple files?

Guru

Hi @mel mendoza ,

maybe it is worth checking Flume to ingest multiple files to Kafka. Alternatively you can use HDF (particularly NiFi) to do so.

Re: Can Kafka process multiple files?

Contributor

Thanks again! I'm currently using NiFi for data collection. will try NiFi to kafka

Don't have an account?
Coming from Hortonworks? Activate your account here