Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to write Spark structured streaming output "foreachbatch" dataframe to one CSV file?

how to write Spark structured streaming output "foreachbatch" dataframe to one CSV file?

New Contributor

I am using Spark Structured Streaming to create CSV files in blob storage using "foreachbatch" in Pyspark. I am seeing individual CSV's for every micro batch in my storage. Is it possible to append to one CSV for all my micro-batches. This is the code I am using.

 

 

 

def foreach_batch_function(streamingDF, epoch_id):
    userSubsetRecs = model2.recommendForUserSubset(streamingDF)
    userSubsetRecs.coalesce(1).write.format('com.databricks.spark.csv').mode("append").save('/mnt/data/bbb.csv')
    

streamingDF.writeStream \
    .option("format", "append") \
    .foreachBatch(foreach_batch_function) \
    .start() \
    .awaitTermination()

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here