Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to decide spark submit configurations

avatar

I want to know how shall i decide upon the --executor-cores,--executor-memory,--num-executors considering i have cluster configuration as : 40 Nodes,20 cores each,100GB each.

I have a data in file of 2GB size and performing filter and aggregation function.

How much value should be given to parameters for --spark-submit command and how will it work.

(I don't want to use dynamic memory allocation for this particular case)

1 ACCEPTED SOLUTION

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thank you @Vikas Srivastava for your inputs but i would like to know how my input data size will affect my configuration.considering we will have other jobs also running in cluster and i want to use enough configuration for my 2GB input only.

avatar
Contributor

In your case, if you try to run it on yarn, you can use the minimum of 1G as well like this

--master yarn-client --executor-memory 1G --executor-cores 2 --num-executors 12
you can increase the number of executors to make it more better 🙂

avatar
Cloudera Employee

Hi,

 

Hope this below links helps in deciding the Configurations apart from the previous comments

 

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/

 

Thanks

AKR