I am using kafka-console-producer to simply cat the files and redirect them into a kafka topic. The topic has 4 partitions and Kafka has 2 brokers, replication factor is 2.
After a few days I realized that some events (rows) from the input files are not in Kafka at all. No file is missing, so this is not the case that the file was not read and redirected to kafka producer, rather the case that some rows was not "commited" or transferred to the Kafka topic. And this happens continuosly, lets say, from 200 files, 10 files are not complete, 0.1% or even less data is missing. And during this, no outage was on the system, not on OS, no service disruption on Kafka, no out of memory nothing.. Everything is green in CDH.
Is there any way how to push a text file to a Kafka topic via a reliable way? I tried producer property acks=1 but did not helped.
Is it normal that some rows gets missing during the push?