We are working for setting up Kafka cluster for internal message and event communications - I am very novice with Kafka, and have some architectural questions -
1) When you create Topic and publish information - Can we/Should we keep all the information ( 100+ fields ) as message
2) Message length and retention days, are there any standards we can follow?
3) Recommandation on Format of the message ( Text / JSON / Avro ?? ) - We might have hortonworks Kafka Setup, so not sure Schema registry is supposed to be there
4) I know Partitions should give good throughput - but at what message volume and sizes we should consider partitioning -=
5) How can Kafka be possible using for Batched Datasets? - e.g. nightly data snapshot?
I stumbled on web for above, but no direct answer to most of them - Please advise.