Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark + yarn cluster: how can i configure physical node to run only one executor\task each time?

spark + yarn cluster: how can i configure physical node to run only one executor\task each time?

New Contributor

Hi,

I have an environment that combines 4 physical nodes with a small amount of RAM and each has 8 CPU cores.
I noticed that spark decides automatically to split the RAM for each CPU. The result is that a memory error occurred.
I'm working with big data structures, and I want that each executor will have the entire RAM memory on the physical node (otherwise i'll get a memory error).
I tried to configure 'yarn.nodemanager.resource.cpu-vcores 1' on 'yarn-site.xml' file or 'spark.driver.cores 1' on spark-defaults.conf without any success.

1 REPLY 1
Highlighted

Re: spark + yarn cluster: how can i configure physical node to run only one executor\task each time?

Explorer

Hi,

 

If you would like to set manually...

 

On the spark side(spark-default.conf),

I think you need to set spark.executor.memory.

If you would like to run one executor, set spark.executor.instances.

 

Apache spark page might be useful.

https://spark.apache.org/docs/1.5.0/running-on-yarn.html

https://spark.apache.org/docs/1.5.0/configuration.html

 

On the yarn side(yarn-site.xml),

Total job memory conf is yarn.nodemanager.resource.memory-mb.

Per job memory conf is yarn.scheduler.maximum-allocation-mb.

 

Cloudera page might be useful.

http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_yarn_tuning.html