I need to consider how to write my data to Hadoop.
I'm using Spark, I got a message from Kafka topic, each message in JSON record.
I have around 200B records per day.
The data fields may be change (not alot but may be change in the future),
I need fast write and fast read, low size in disk.
What should I choose? Avro or Parquet?
If I choose Parquet/Avro, Should I need to create the table with all fields of my JSON?
If no, What is the way to create the table with Parquet format and Avro format?
Thanks!!