I have an environment that combines 4 physical nodes with a small amount of RAM and each has 8 CPU cores.
I noticed that spark decides automatically to split the RAM for each CPU. The result is that a memory error occurred.
I'm working with big data structures, and I want that each executor will have the entire RAM memory on the physical node (otherwise i'll get a memory error).
I tried to configure 'yarn.nodemanager.resource.cpu-vcores 1' on 'yarn-site.xml' file or 'spark.driver.cores 1' on spark-defaults.conf without any success.
If you would like to set manually...
On the spark side(spark-default.conf),
I think you need to set spark.executor.memory.
If you would like to run one executor, set spark.executor.instances.
Apache spark page might be useful.
On the yarn side(yarn-site.xml),
Total job memory conf is yarn.nodemanager.resource.memory-mb.
Per job memory conf is yarn.scheduler.maximum-allocation-mb.
Cloudera page might be useful.