I'm designing an architecture for our data infrastructure in our company. This infrastructure will get payment and user behavioral events from Http producer (Node JS).
I've planned to use Kafka and Hive and Spark and use Avro file format for Kafka and pass them using Kafka Connect to HDFS and hive. But I've read that Avro is not preferred file format to process with Spark and I should use Parquet instead.
Now the question is that how I can generate and pass Parquet to/from Kafka? I'm confused with lot of choices here. Any advice/resource is welcomed. 🙂