I am gettin the tipical error
Container [pid=18542,containerID=container_e75_1537176390063_0001_01_000001] is running beyond physical memory limits. Current usage: 12.6 GB of 12 GB physical memory used; 19.0 GB of 25.2 GB virtual memory used.
But I do not have 12GB configured anywere on Ambari, nor yarn nor mapreduce2 ¿where is that value?
The following Article explains in detail about the issue sand it's remedy:
is running beyond physical memory limits .....
I am running that as admin user using spark-submit
export PYTHONIOENCODING=utf8; time spark-submit -v --master yarn --deploy-mode cluster --driver-memory 8G --conf spark.network.timeout=10000000 --conf spark.executor.heartbeatInterval=1000000 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.default.parallelism=2200 --conf spark.sql.shuffle.partitions=2200 --conf spark.driver.maxResultSize="4G" test.py
When reduced yarn.scheduler.minimum-allocation-mb from 4G to 1G the error changed from 12Gb to
is running beyond physical memory limits. Current usage: 10.3 GB of 9 GB physical memory used
So... How are that limits calculated?
In spark, spark.driver.memoryOverhead is considered in calculating the total memory required for the driver. By default it is 0.10 of the driver-memory or minimum 384MB. In your case it will be 8GB * 0.1 = 9011MB ~= 9G
YARN allocates memory only in increments/multiples of yarn.scheduler.minimum-allocation-mb .
When yarn.scheduler.minimum-allocation-mb=4G, it can only allocate container sizes of 4G,8G,12G etc. So if something like 9G is requested it will round up to the next multiple and will allocate 12G of container size for the driver.
When yarn.scheduler.minimum-allocation-mb=1G, then container sizes of 8G, 9G, 10G are possible. The nearest rounded up size of 9G will be used in this case.