Support Questions
Find answers, ask questions, and share your expertise

How to decide spark submit configurations

Solved Go to solution

How to decide spark submit configurations

I want to know how shall i decide upon the --executor-cores,--executor-memory,--num-executors considering i have cluster configuration as : 40 Nodes,20 cores each,100GB each.

I have a data in file of 2GB size and performing filter and aggregation function.

How much value should be given to parameters for --spark-submit command and how will it work.

(I don't want to use dynamic memory allocation for this particular case)

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How to decide spark submit configurations

Contributor

Number of cores = Concurrent tasks as executor can run

So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

Coming back to next step, with 5 as cores per executor, and 19 as total available cores in one Node(CPU) - we come to ~4 executors per node.

So memory for each executor is 98/4 = ~24GB.

Calculating that overhead - .07 * 24 (Here 24 is calculated as above) = 1.68

Since 1.68 GB > 384 MB, the over head is 1.68.

Take the above from each 21 above => 24 - 1.68 ~ 22 GB

View solution in original post

4 REPLIES 4
Highlighted

Re: How to decide spark submit configurations

Contributor

Number of cores = Concurrent tasks as executor can run

So we might think, more concurrent tasks for each executor will give better performance. But research shows that any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.

Coming back to next step, with 5 as cores per executor, and 19 as total available cores in one Node(CPU) - we come to ~4 executors per node.

So memory for each executor is 98/4 = ~24GB.

Calculating that overhead - .07 * 24 (Here 24 is calculated as above) = 1.68

Since 1.68 GB > 384 MB, the over head is 1.68.

Take the above from each 21 above => 24 - 1.68 ~ 22 GB

View solution in original post

Highlighted

Re: How to decide spark submit configurations

Thank you @Vikas Srivastava for your inputs but i would like to know how my input data size will affect my configuration.considering we will have other jobs also running in cluster and i want to use enough configuration for my 2GB input only.

Highlighted

Re: How to decide spark submit configurations

Contributor

In your case, if you try to run it on yarn, you can use the minimum of 1G as well like this

--master yarn-client --executor-memory 1G --executor-cores 2 --num-executors 12
you can increase the number of executors to make it more better :)

Highlighted

Re: How to decide spark submit configurations

Cloudera Employee

Hi,

 

Hope this below links helps in deciding the Configurations apart from the previous comments

 

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/

 

Thanks

AKR