Created 09-08-2025 05:04 AM
How to run spark df.write inside UDF called in rdd.foreach or rdd.foreachpartition
I.e. spark session object inside executor.
Created 09-08-2025 01:12 PM
Hello @Jack_sparrow,
Glad to see you on the Community.
As far as I know, df.write is not possible to be used on an rdd.foreach or rdd.foreachpartition.
The reason is because df.write is a driver-side action, it triggers a Spark job.
rdd.foreach or rdd.foreachpartition are executors, and executors cannot trigger jobs.
Check these references:
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://sparkbyexamples.com/spark/spark-foreachpartition-vs-foreach-explained
The option that looks like it works for you is this:
df.write.partitionBy
Something like this:
df.write.partitionBy("someColumn").parquet("/path/out")
Created 09-08-2025 08:32 AM
@Jack_sparrow, Welcome to our community! To help you get the best possible answer, I have tagged in our Spark experts @haridjh and @vafs, who may be able to assist you further.
Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 09-08-2025 01:12 PM
Hello @Jack_sparrow,
Glad to see you on the Community.
As far as I know, df.write is not possible to be used on an rdd.foreach or rdd.foreachpartition.
The reason is because df.write is a driver-side action, it triggers a Spark job.
rdd.foreach or rdd.foreachpartition are executors, and executors cannot trigger jobs.
Check these references:
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://sparkbyexamples.com/spark/spark-foreachpartition-vs-foreach-explained
The option that looks like it works for you is this:
df.write.partitionBy
Something like this:
df.write.partitionBy("someColumn").parquet("/path/out")
Created 09-08-2025 09:01 PM
Thank you for the response.