Support Questions
Find answers, ask questions, and share your expertise

create HDFS file in hourly splits

create HDFS file in hourly splits



My requirement is to create a parquet/Avro file from spark steaming every hour via structured streaming.

Is there a way to control this rollover of file creation in Spark level or Structured streaming. i.e data that is coming from 12 AM to 1 AM should be loaded in a specific fie. 1 AM to 2 AM should be in another file.

I understand that you can create a partitioned hive table to achieve it. The customer wants to avoid Hive and looking for any possible alternatives to address this requirement.

Thanks in advance.