- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Issue of container OOM when writing Dataframe to parquet files in Spark Job
Created on ‎01-16-2023 12:40 AM - edited ‎01-16-2023 07:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using Machine Learning Workspace in Cloudera Data Platform (CDP). I created a session with 4vCPU/16 GiB Memory and enabled Spark 3.2.0.
I'm using spark to load data of one month (the whole month data size is around 12 GB) and do some transformation, then write the data as parquet files on AWS S3.
My Spark session configuration looks like this:
SparkSession
.builder
.appName(appName)
.config("spark.driver.memory", "8G")
.config("spark.dynamicAllocation.enabled", "true")
.config("spark.dynamicAllocation.minExecutors", "4")
.config("spark.dynamicAllocation.maxExecutors", "20")
.config("spark.executor.cores", "4")
.config("spark.executor.memory", "8G")
.config("spark.sql.shuffle.partitions", 500)
......
Before the data are written to parquet files, they are repartitioned:
df.withColumn("salt", math.floor(rand() * 100))
.repartition("date_year", "date_month", "date_day", "salt")
.drop("salt").write.partitionBy("date_year", "date_month")
.mode("overwrite").parquet(SOME__PATH)
The data transformation with spark run sucessfully. But the spark job failed always in the last step when writing data to parquet files.
Below is the example of the error message:
23/01/15 21:10:59 678 ERROR TaskSchedulerImpl: Lost executor 2 on 100.100.18.155: The executor with id 2 exited with exit code -1(unexpected). The API gave the following brief reason: Evicted The API gave the following message: Pod ephemeral local storage usage exceeds the total limit of containers 10Gi.
I think there is no problem with my spark configuration. The problem is the configuration of kubenete ephemeral local storage size limitation, which I do not have the right to change it.
Can some one explain why this happened and what is is possbile solution for it?
Created ‎01-16-2023 09:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Ryan_2002
Thanks for engaging Cloudera Community. First of all, Thank You for the detailed description of the Problem. I believe your ask is Valid, yet reviewing the same over a Community Post isn't a suitable approach. Feasible for you to engage Cloudera Support to allow our Team to work with you, with the suitability of Screen-Sharing Session as well as Logs exchange, both of which aren't feasible in Community. That would greatly expedite the review of your ask.
Regards, Smarak
