Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

pyspark - Kafka integration : how to control the data written to Kafka topic?

pyspark - Kafka integration : how to control the data written to Kafka topic?

New Contributor

I am learning kafka - spark streaming and trying to build a small unreal setup as follows:

  1. small python program will read data from a file.
  2. Write each record to kafka topic.
  3. wrote another pyspark program which consumes data from Kafka topic.

i have observed many batches submitted in kafka consumer log while writing the data to kafka topic. My question is: will it submit a batch for each record? i havd 70000 records in the file. will it submit 70000 batches to read data from kafka topic? is there a limitation on # of batches to be submitted? can i batch 10000 recs into one batch & write it to kafka topic?