Support Questions

Jack_sparrow · ‎09-08-2025

How to run spark df.write inside UDF called in rdd.foreach or rdd.foreachpartition

I.e. spark session object inside executor.

vafs · ‎09-08-2025

Glad to see you on the Community.

As far as I know, df.write is not possible to be used on an rdd.foreach or rdd.foreachpartition.
The reason is because df.write is a driver-side action, it triggers a Spark job.
rdd.foreach or rdd.foreachpartition are executors, and executors cannot trigger jobs.

Check these references:
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-...
https://sparkbyexamples.com/spark/spark-foreachpartition-vs-foreach-explained

The option that looks like it works for you is this:

df.write.partitionBy

Something like this:

df.write.partitionBy("someColumn").parquet("/path/out")

Regards,
Andrés Fallas
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs-up button.

View solution in original post

VidyaSargur · ‎09-08-2025

@Jack_sparrow, Welcome to our community! To help you get the best possible answer, I have tagged in our Spark experts @haridjh and @vafs, who may be able to assist you further.

Please feel free to provide any additional information or details about your query. We hope that you will find a satisfactory solution to your question.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

vafs · ‎09-08-2025

Hello @Jack_sparrow,