Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Spark Submit - Spark Parameter Setting

Hi Team,

 

I have below HADOOP Server details in our environment.

#1 Node Cluster working - 

>Nodemanagers:166

>Datanodes:159

#2 64 Cores per Node

#3 503 GB RAM per node.

 

According to above node and core details

"I WANT TO SET SPARK SUBMIT below parameter

--driver-memory 
--driver-cores 
--num-executors
--executor-memory 
--executor-cores " for that Please suggest me how to calculate it and also please share the calculation logic for the same. 

 

Also #2 question is, In shell script we are calling the .py Python code using given spark parameter as - 

spark-submit
--conf spark.maxRemoteBlockSizeFetchToMem=2G
--conf hive.exec.dynamic.partition=true
--conf hive.enforce.bucketing=true
--conf hive.exec.dynamic.partition.mode=nonstrict
--master yarn
--deploy-mode client
--driver-memory 30G 
--driver-cores 4 
--num-executors 99
--executor-memory 40G 
--executor-cores 4 
--conf spark.sql.shuffle.partitions=800
--conf spark.shuffle.compress=true
--conf spark.port.maxRetries=100
--conf spark.shuffle.spill.compress=true
--conf spark.driver.maxResultSize=8g
--conf spark.broadcast.compress=true
--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true
--conf spark.yarn.executor.memoryOverhead=4G
--conf spark.hive.mapred.supports.subdirectories=true
--conf spark.shuffle.io.maxRetries=50
--conf spark.shuffle.io.retryWait=60s
--conf spark.reducer.maxReqsInFlight=1

 

and the python job is taking 5-6hrs to execute. Could some one please suggest me how to tune the job on spark parameter level if possible please guide me.

 

Thanks in advance for your kind support always.

3 REPLIES 3

Rising Star

@pankshiv1809 

You can review below blogs for tuning spark applications based on your case you need to tune executer,driver memories and cores along with other parameters mentioned in below blog.

 

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

 

Thanks!

 

Rising Star

@pankshiv1809 was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.