- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
spark.yarn.executor.memoryOverhead...
- Labels:
-
Apache Spark
Created 05-04-2016 04:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does anyone know exactly what spark.yarn.executor.memoryOverhead is used for and why it may be using up so much space? If I could, I would love to have a peek inside this stack. Spark's description is as follows:
The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
The problem I'm having is when running spark queries on large datasets ( > 5TB), I am required to set the executor memoryOverhead to 8GB otherwise it would throw an exception and die. What is being stored in this container that it needs 8GB per container?
I've also noticed that this error doesn't occur on standalone mode, because it doesn't use YARN.
Note. My configurations for this job are:
executor memory = 15G executor cores = 5 yarn.executor.memoryOverhead = 8GB max executors = 60 offHeap.enabled = false
Created 05-04-2016 04:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Henry,
you may be interested by this article: http://www.wdong.org/wordpress/blog/2015/01/08/spark-on-yarn-where-have-all-my-memory-gone/
The link seems to be dead at the moment (here is a cached version: http://m.blog.csdn.net/article/details?id=50387104)
Created 05-04-2016 05:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. What blows my mine is this statement from the article OVERHEAD = max(SPECIFIED_MEMORY * 0.07, 384M)
If I'm allocating 8GB for memoryOverhead, then OVERHEAD = 567 MB !!
What is yarn using the other 7.5 GB for?
Created 05-04-2016 07:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Henry : I think that equation uses the executor memory (in your case, 15G) and outputs the overhead value.
// Below calculation uses executorMemory, not memoryOverhead math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, MEMORY_OVERHEAD_MIN))
The goal is to calculate OVERHEAD as a percentage of real executor memory, as used by RDDs and DataFrames. I will add that when using Spark on Yarn, the Yarn configuration settings have to be adjusted and tweaked to match up carefully with the Spark properties (as the referenced blog suggests). You might also want to look at Tiered Storage to offload RDDs into MEM_AND_DISK, etc.
Created 05-04-2016 07:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Henry:
Since you are requesting 15G for each executor, you may want to increase the size of Java Heap space for the Spark executors, as allocated using this parameter:
spark.executor.extraJavaOptions='-Xmx24g'
Created 11-17-2017 04:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Per recent Spark docs, you can't actually set the heap size that way. You need to use `spark.executor.memory` to do so. https://spark.apache.org/docs/2.1.1/configuration.html#runtime-environment
,Just an FYI, Spark 2.1.1 doesn't allow setting the heap space in `extraJavaOptions`:
java.lang.Exception: spark.executor.extraJavaOptions is not allowed to specify max heap memory settings (was ''-Xmx20g''). Use spark.executor.memory instead.