Support Questions

pawaranand011 · ‎03-01-2018

I want to know how shall i decide upon the --executor-cores,--executor-memory,--num-executors considering i have cluster configuration as : 40 Nodes,20 cores each,100GB each.

I have a data in file of 2GB size and performing filter and aggregation function.

How much value should be given to parameters for --spark-submit command and how will it work.

(I don't want to use dynamic memory allocation for this particular case)

edu_vikassri · ‎03-01-2018

Number of cores = Concurrent tasks as executor can run

So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

Coming back to next step, with 5 as cores per executor, and 19 as total available cores in one Node(CPU) - we come to ~4 executors per node.

So memory for each executor is 98/4 = ~24GB.

Calculating that overhead - .07 * 24 (Here 24 is calculated as above) = 1.68

Since 1.68 GB > 384 MB, the over head is 1.68.

Take the above from each 21 above => 24 - 1.68 ~ 22 GB

View solution in original post

edu_vikassri · ‎03-01-2018

Number of cores = Concurrent tasks as executor can run

So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

Coming back to next step, with 5 as cores per executor, and 19 as total available cores in one Node(CPU) - we come to ~4 executors per node.

So memory for each executor is 98/4 = ~24GB.

Calculating that overhead - .07 * 24 (Here 24 is calculated as above) = 1.68

Since 1.68 GB > 384 MB, the over head is 1.68.

Take the above from each 21 above => 24 - 1.68 ~ 22 GB

pawaranand011 · ‎03-04-2018

Thank you @Vikas Srivastava for your inputs but i would like to know how my input data size will affect my configuration.considering we will have other jobs also running in cluster and i want to use enough configuration for my 2GB input only.

edu_vikassri · ‎03-04-2018

In your case, if you try to run it on yarn, you can use the minimum of 1G as well like this

--master yarn-client --executor-memory 1G --executor-cores 2 --num-executors 12
you can increase the number of executors to make it more better 🙂

AKR · ‎01-05-2020

Hi,

Hope this below links helps in deciding the Configurations apart from the previous comments

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/

Thanks

AKR

Cloudera Community

Support Questions

How to decide spark submit configurations

Spark 3 legacy configurations list ( Spark 2 behav...

How to Simplify Spark-Submit JAR Dependency Manage...

Spark submit multiple configurations

How to Submit Spark Application through Livy REST ...

Submitting Spark Jobs From Apache NiFi Using Livy

How to pass atlas-application.properties configura...

Spark Python Supportability Matrix

How to configure CML's Spark Connection

NiFi submitting Spark jobs in batch mode

Spark and Java versions Supportability Matrix