Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

create HDFS file in hourly splits

create HDFS file in hourly splits



My requirement is to create a parquet/Avro file from spark steaming every hour via structured streaming.

Is there a way to control this rollover of file creation in Spark level or Structured streaming. i.e data that is coming from 12 AM to 1 AM should be loaded in a specific fie. 1 AM to 2 AM should be in another file.

I understand that you can create a partitioned hive table to achieve it. The customer wants to avoid Hive and looking for any possible alternatives to address this requirement.

Thanks in advance.


Don't have an account?
Coming from Hortonworks? Activate your account here