Member since
01-12-2023
3
Posts
0
Kudos Received
0
Solutions
01-16-2023
12:40 AM
I'm using Machine Learning Workspace in Cloudera Data Platform (CDP). I created a session with 4vCPU/16 GiB Memory and enabled Spark 3.2.0. I'm using spark to load data of one month (the whole month data size is around 12 GB) and do some transformation, then write the data as parquet files on AWS S3. My Spark session configuration looks like this: SparkSession .builder .appName(appName) .config("spark.driver.memory", "8G") .config("spark.dynamicAllocation.enabled", "true") .config("spark.dynamicAllocation.minExecutors", "4") .config("spark.dynamicAllocation.maxExecutors", "20") .config("spark.executor.cores", "4") .config("spark.executor.memory", "8G") .config("spark.sql.shuffle.partitions", 500) ...... Before the data are written to parquet files, they are repartitioned: df.withColumn("salt", math.floor(rand() * 100)) .repartition("date_year", "date_month", "date_day", "salt") .drop("salt").write.partitionBy("date_year", "date_month") .mode("overwrite").parquet(SOME__PATH) The data transformation with spark run sucessfully. But the spark job failed always in the last step when writing data to parquet files. Below is the example of the error message: 23/01/15 21:10:59 678 ERROR TaskSchedulerImpl: Lost executor 2 on 100.100.18.155:
The executor with id 2 exited with exit code -1(unexpected).
The API gave the following brief reason: Evicted
The API gave the following message: Pod ephemeral local storage usage exceeds the total limit of containers 10Gi. I think there is no problem with my spark configuration. The problem is the configuration of kubenete ephemeral local storage size limitation, which I do not have the right to change it. Can some one explain why this happened and what is is possbile solution for it?
... View more
01-13-2023
01:43 AM
Hi Smarak, thanks for your answer. That helps me!
... View more
01-12-2023
05:08 AM
In CDP Public Cloud Machine Learning, we can create a new session with reserved resource, for example 4vCPU and 16 GiB Memory. We can also create spark session inside the machine learning workbench with some memory configuration. For example: spark = (SparkSession.builder.appName(appName).config("spark.driver.memory", "16G").config("spark.executor.instances", "10").config("spark.executor.cores", "4").config("spark.executor.memory", "20G").getOrCreate()) My question is, how will the memory be allocated to Spark session now? Is the reserved resource (4vCPU and 16 GiB Memory) in machine learning session the maximal limitation for total spark memory usage? How many work nodes and executors can I configure for the spark session?
... View more