Need your suggestion on below use case..
Cluster info : 10 partition kafka setup , one consumer group has 5 consumer.
Incoming data to kafka cluster is time series based. Want to make sure that , time series data are processed chronologically.
Is data guaranteed to be produced chronologically? Can you afford to embed a timestamp into the message and sort client-side?
Kafka guarantees order within a single partition, and the partition can be based on a hash of some key, so for example, all events by user_id X will be ordered within a partition. Refer: https://stackoverflow.com/questions/29820384/apache-kafka-order-of-messages-with-multiple-partitions
Thanks for the link. it clearly says that . you have to have single partition in order to handle the situation.
but, this will not help for parallel processing.
by the way, what is the solution to the problem statement, having one partition or some other tool will help ?
If you want to use multiple partitions, in my experience, you would handle that by embedding a message production time, then at the consumer level extract that. For example, you could dumping data into some time-series capable database, then querying that ordering by timestamp.