Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Need suggestion on one use in Kafka

Highlighted

Need suggestion on one use in Kafka

Hi All,

Need your suggestion on below use case..

Cluster info : 10 partition kafka setup , one consumer group has 5 consumer.

Incoming data to kafka cluster is time series based. Want to make sure that , time series data are processed chronologically.

Regards,

Gobi.S

3 REPLIES 3
Highlighted

Re: Need suggestion on one use in Kafka

Super Collaborator

Is data guaranteed to be produced chronologically? Can you afford to embed a timestamp into the message and sort client-side?

Kafka guarantees order within a single partition, and the partition can be based on a hash of some key, so for example, all events by user_id X will be ordered within a partition. Refer: https://stackoverflow.com/questions/29820384/apache-kafka-order-of-messages-with-multiple-partitions

Re: Need suggestion on one use in Kafka

@Jordan Moore

Thanks for the link. it clearly says that . you have to have single partition in order to handle the situation.

but, this will not help for parallel processing.

by the way, what is the solution to the problem statement, having one partition or some other tool will help ?

Highlighted

Re: Need suggestion on one use in Kafka

Super Collaborator

If you want to use multiple partitions, in my experience, you would handle that by embedding a message production time, then at the consumer level extract that. For example, you could dumping data into some time-series capable database, then querying that ordering by timestamp.

Don't have an account?
Coming from Hortonworks? Activate your account here