Here is the Problem Scenario :
We receive CSV from legacy system, consumed every 1 hour with 1TB of data.
We are in the process of Creating a data model which can store this data .
This data model needs to support millions of read per second.
Question 1 : What is the persistence storage system we can use to store this data.
and using Spark Job in Scala which can takes the CSV file and store that in the storage system of our choice.
Question 2: What real-time or batch processing technologies we can use ?
Many Thanks !