Support Questions
Find answers, ask questions, and share your expertise

yarn-client vs yarn-cluster: Spark Driver memory differences

Expert Contributor

I'm doing some benchmarks and I look also at the memory issues in Spark. When running my Spark Application I saw, that the memory of the Spark Driver process differs, depending on the deployment-mode I run the app in.

The table below shows the different values that my Spark UI showed in the executors -> driver section of the app. I set the driver memory per spark-submit parameter --driver-memory.

--driver-memory parameteryarn-clientyarn-cluster
1g511.1 MB511.9 MB
2g1247.3 MB1140.4 MB
5g3.4 GB3.1 GB
10g7.0 GB6.4 GB
20g14.2 GB13.1 GB

I found a calculation on the Internet (https://0x0fff.com/spark-memory-management/), saying, that

driverMemory = (driver.memory * scalaOverhead - SystemReserved) * memoryFraction
             = (driver.memory * ~0.96 - 300) * 0.75

estimates the memory that is really used for the driver process (the Spark memory without User Memory).

This calculation fits very good for me for the yarn-client mode! But, as you can see in the table, the values differ between both modes, the gap grows when I increase the --driver-memory value.

Do I miss anything here?

0 REPLIES 0