Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

num-executors and executor-memory in Spark

Highlighted

num-executors and executor-memory in Spark

Expert Contributor

Can someone explain following lines about num-executors and executor-memory

Below written statement on hortonworks site (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_spark-guide/content/ch_tuning-spark.html#spark-job-status):

There are tradeoffs between num-executors and executor-memory. Large executor memory does not imply better performance, due to JVM garbage collection. Sometimes it is better to configur a larger number of small JVMs than a small number of large JVMs.

1 REPLY 1
Highlighted

Re: num-executors and executor-memory in Spark

Lets try to go through the anatomy of spark on Yarn

1. YARN containers are spawned on Physcial nodes
2. Spark AM request for fixed number of YARN container and run Spark tasks(Map, reduce, filter) within them known as executors
3. Each task run as thread with the executor.
4. Container -> spark executor -> task threads.

5. depending on the the input param executor-memory each executor is spawned as a JVM with the defined heap size.

6. This JVM memory is used for running the Task threads and keeping data in-memory too. GC is also happening because of memory allocation and de-allocation

So overall executor are JVM running multiple threads, and the question boils down to , will a JVM of infinite size has best performance . And the answer is no , because JVM performance is overall factor of wise memory allocation to minimize Garbage collection.

Huge JVM starts to perform better initially. as the process starts there is huge amount of memory available for allocation , eventually memory starts getting full and more time is spend on doing GC. classical Bell curve case of performance and memory.

Huge JVM with time results into de-fragmentation and long pause time occurs when full GC happens. The solution is to have a fine balance between num-executor and executor-memory

Don't have an account?
Coming from Hortonworks? Activate your account here