Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Submit - Spark Parameter Setting

avatar
Contributor

Hi Team,

 

I have below HADOOP Server details in our environment.

#1 Node Cluster working - 

>Nodemanagers:166

>Datanodes:159

#2 64 Cores per Node

#3 503 GB RAM per node.

 

According to above node and core details

"I WANT TO SET SPARK SUBMIT below parameter

--driver-memory 
--driver-cores 
--num-executors
--executor-memory 
--executor-cores " for that Please suggest me how to calculate it and also please share the calculation logic for the same. 

 

Also #2 question is, In shell script we are calling the .py Python code using given spark parameter as - 

spark-submit
--conf spark.maxRemoteBlockSizeFetchToMem=2G
--conf hive.exec.dynamic.partition=true
--conf hive.enforce.bucketing=true
--conf hive.exec.dynamic.partition.mode=nonstrict
--master yarn
--deploy-mode client
--driver-memory 30G 
--driver-cores 4 
--num-executors 99
--executor-memory 40G 
--executor-cores 4 
--conf spark.sql.shuffle.partitions=800
--conf spark.shuffle.compress=true
--conf spark.port.maxRetries=100
--conf spark.shuffle.spill.compress=true
--conf spark.driver.maxResultSize=8g
--conf spark.broadcast.compress=true
--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true
--conf spark.yarn.executor.memoryOverhead=4G
--conf spark.hive.mapred.supports.subdirectories=true
--conf spark.shuffle.io.maxRetries=50
--conf spark.shuffle.io.retryWait=60s
--conf spark.reducer.maxReqsInFlight=1

 

and the python job is taking 5-6hrs to execute. Could some one please suggest me how to tune the job on spark parameter level if possible please guide me.

 

Thanks in advance for your kind support always.

3 REPLIES 3

avatar
Expert Contributor

@pankshiv1809 

You can review below blogs for tuning spark applications based on your case you need to tune executer,driver memories and cores along with other parameters mentioned in below blog.

 

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/

https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/

 

Thanks!

 

avatar
Contributor

avatar
Expert Contributor

@pankshiv1809 was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.