New Contributor
Posts: 2
Registered: ‎12-20-2017

pyspark - Kafka integration : how to control the data written to Kafka topic?

I am learning kafka - spark streaming and trying to build a small unreal setup as follows:

  1. small python program will read data from a file.
  2. Write each record to kafka topic.
  3. wrote another pyspark program which consumes data from Kafka topic.

i have observed many batches submitted in kafka consumer log while writing the data to kafka topic. My question is: will it submit a batch for each record? i havd 70000 records in the file. will it submit 70000 batches to read data from kafka topic? is there a limitation on # of batches to be submitted? can i batch 10000 recs into one batch & write it to kafka topic?