Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Some questions about Spark Memory Manager

Highlighted

Some questions about Spark Memory Manager

New Contributor

1. What is UserMemory?
From https://0x0fff.com/spark-memory-management/, I know there is a memory named `UserMemory` in Spark Memory Model. But I don't understand what it is used for. I guess it is used for storing object created by our customized functions.
For example, if there is a function:

rdd.map {
	item => {
		val tmpObj = new String("Um")
		item
	}
}


Is `tmpObj` allocated in UserMemory?

 

2. What the size of UserMemory?
From above url, I know the size of UserMemory is `Java Heap - Reserved Memory - (Java Heap * spark.memory.fraction)`. But, what will happen if I allocate too many objects in UserMemory? I mean, what will happen if UserMemory's size is 100MB, but I allocate 200MB in function? Will OOM?

And I have not seen any codes about UserMemory in Spark 2.4.0-SNAPSHOT

I checked `UnifiedMemoryManager` in Spark 2.4.0-SNAPSHOT, I find out that, when acquireMemory, it always based on the initial storage/execution memory, but not based on the actually free memory. For example, if the size of storage/execution memory + UserMemory is 600MB, Storage memory is 250MB, Execution memory is 250MB, User Memory is 100MB. If User Memory occupy 200MB now, Storage Memory occupy 250MB, Execution Memory is 150MB. Then, if `UnifiedMemoryManager` wanna allocate execution memory, it seen there is 100MB available. But actually therer is no memory is available. So, how Spark solve this problem?

 

3. Where Reducer allocates objects when shuffle?
I guess it is UserMemory.

 

4. What's the relationship between Spark Memory Model and Java Heap?
There is Young Generation, Old Generation, Permanent Generation in Java Heap, where the Storage Memory/Execution Memory, User Memory locates? Young Generation? Old Generation? Or cross generation? I guess it is cross generation.
But, assume this case(ignore Reserved Memory in Spark):
In Heap, Young Generation:Old Generation:Permanent Generation=200MB:300MB:100MB. In Spark, Storage Memory:Execution Memory:User Memory=250MB:250MB:100MB. And Execution Memory/Storage Memory always save big object(For example, ExternalAppendOnlyMap's value is a String whose size is 100MB). So, Execution Memory/Storage Memory will always allocate object in Old Generation(Assume the -XX:PretenureSizeThreshold=10MB). Assume Spark wanna allocates so many objects in Execution Memory/Storage Memory to occupy 450MB memory, and these objects will never be garbage collected.
So, in this case, what will Spark do? Although Storage Memory + Execution Memory = 500MB, but Old Generation just 300MB. OOM? That's so odd, because we guarantee there is 500MB for Execution Memory/Storage Memory. But now we just use 450MB, OOM happened.

Don't have an account?
Coming from Hortonworks? Activate your account here