I have below HADOOP Server details in our environment.
#1 Node Cluster working -
#2 64 Cores per Node
#3 503 GB RAM per node.
According to above node and core details
"I WANT TO SET SPARK SUBMIT below parameter
--driver-memory --driver-cores --num-executors--executor-memory --executor-cores " for that Please suggest me how to calculate it and also please share the calculation logic for the same.
Also #2 question is, In shell script we are calling the .py Python code using given spark parameter as -
spark-submit--conf spark.maxRemoteBlockSizeFetchToMem=2G--conf hive.exec.dynamic.partition=true--conf hive.enforce.bucketing=true--conf hive.exec.dynamic.partition.mode=nonstrict--master yarn--deploy-mode client--driver-memory 30G --driver-cores 4 --num-executors 99--executor-memory 40G --executor-cores 4 --conf spark.sql.shuffle.partitions=800--conf spark.shuffle.compress=true--conf spark.port.maxRetries=100--conf spark.shuffle.spill.compress=true--conf spark.driver.maxResultSize=8g--conf spark.broadcast.compress=true--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true--conf spark.yarn.executor.memoryOverhead=4G--conf spark.hive.mapred.supports.subdirectories=true--conf spark.shuffle.io.maxRetries=50--conf spark.shuffle.io.retryWait=60s--conf spark.reducer.maxReqsInFlight=1
and the python job is taking 5-6hrs to execute. Could some one please suggest me how to tune the job on spark parameter level if possible please guide me.
Thanks in advance for your kind support always.
You can review below blogs for tuning spark applications based on your case you need to tune executer,driver memories and cores along with other parameters mentioned in below blog.
AsimShaikh : Okay i will refer the blog and try to tune executer and other parameter.
@pankshiv1809 was your question answered? Make sure to mark the answer as the accepted solution.If you find a reply useful, say thanks by clicking on the thumbs up button.