My requirement is to create a parquet/Avro file from spark steaming every hour via structured streaming.
Is there a way to control this rollover of file creation in Spark level or Structured streaming. i.e data that is coming from 12 AM to 1 AM should be loaded in a specific fie. 1 AM to 2 AM should be in another file.
I understand that you can create a partitioned hive table to achieve it. The customer wants to avoid Hive and looking for any possible alternatives to address this requirement.