<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Spark Submit - Spark Parameter Setting in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357674#M237643</link>
    <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have below HADOOP Server details in our environment.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#1 Node Cluster working -&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;gt;Nodemanagers:166&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;gt;Datanodes:159&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#2 64 Cores per Node&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#3 503 GB RAM per node.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;According to above node and core details&lt;/P&gt;&lt;P&gt;"I WANT TO SET SPARK SUBMIT below parameter&lt;/P&gt;&lt;P&gt;--driver-memory&amp;nbsp;&lt;BR /&gt;--driver-cores&amp;nbsp;&lt;BR /&gt;--num-executors&lt;BR /&gt;--executor-memory&amp;nbsp;&lt;BR /&gt;--executor-cores " for that Please suggest me how to calculate it and also please share the calculation logic for the same.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also #2 question is, In shell script we are calling the .py Python code using given spark parameter as -&amp;nbsp;&lt;/P&gt;&lt;P&gt;spark-submit&lt;BR /&gt;--conf spark.maxRemoteBlockSizeFetchToMem=2G&lt;BR /&gt;--conf hive.exec.dynamic.partition=true&lt;BR /&gt;--conf hive.enforce.bucketing=true&lt;BR /&gt;--conf hive.exec.dynamic.partition.mode=nonstrict&lt;BR /&gt;--master yarn&lt;BR /&gt;--deploy-mode client&lt;BR /&gt;--driver-memory 30G&amp;nbsp;&lt;BR /&gt;--driver-cores 4&amp;nbsp;&lt;BR /&gt;--num-executors 99&lt;BR /&gt;--executor-memory 40G&amp;nbsp;&lt;BR /&gt;--executor-cores 4&amp;nbsp;&lt;BR /&gt;--conf spark.sql.shuffle.partitions=800&lt;BR /&gt;--conf spark.shuffle.compress=true&lt;BR /&gt;--conf spark.port.maxRetries=100&lt;BR /&gt;--conf spark.shuffle.spill.compress=true&lt;BR /&gt;--conf spark.driver.maxResultSize=8g&lt;BR /&gt;--conf spark.broadcast.compress=true&lt;BR /&gt;--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true&lt;BR /&gt;--conf spark.yarn.executor.memoryOverhead=4G&lt;BR /&gt;--conf spark.hive.mapred.supports.subdirectories=true&lt;BR /&gt;--conf spark.shuffle.io.maxRetries=50&lt;BR /&gt;--conf spark.shuffle.io.retryWait=60s&lt;BR /&gt;--conf spark.reducer.maxReqsInFlight=1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and the python job is taking 5-6hrs to execute. Could some one please suggest me how to tune the job on spark parameter level if possible please guide me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance for your kind support always.&lt;/P&gt;</description>
    <pubDate>Tue, 15 Nov 2022 18:11:14 GMT</pubDate>
    <dc:creator>pankshiv1809</dc:creator>
    <dc:date>2022-11-15T18:11:14Z</dc:date>
    <item>
      <title>Spark Submit - Spark Parameter Setting</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357674#M237643</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have below HADOOP Server details in our environment.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#1 Node Cluster working -&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;gt;Nodemanagers:166&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&amp;gt;Datanodes:159&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#2 64 Cores per Node&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;#3 503 GB RAM per node.&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;According to above node and core details&lt;/P&gt;&lt;P&gt;"I WANT TO SET SPARK SUBMIT below parameter&lt;/P&gt;&lt;P&gt;--driver-memory&amp;nbsp;&lt;BR /&gt;--driver-cores&amp;nbsp;&lt;BR /&gt;--num-executors&lt;BR /&gt;--executor-memory&amp;nbsp;&lt;BR /&gt;--executor-cores " for that Please suggest me how to calculate it and also please share the calculation logic for the same.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also #2 question is, In shell script we are calling the .py Python code using given spark parameter as -&amp;nbsp;&lt;/P&gt;&lt;P&gt;spark-submit&lt;BR /&gt;--conf spark.maxRemoteBlockSizeFetchToMem=2G&lt;BR /&gt;--conf hive.exec.dynamic.partition=true&lt;BR /&gt;--conf hive.enforce.bucketing=true&lt;BR /&gt;--conf hive.exec.dynamic.partition.mode=nonstrict&lt;BR /&gt;--master yarn&lt;BR /&gt;--deploy-mode client&lt;BR /&gt;--driver-memory 30G&amp;nbsp;&lt;BR /&gt;--driver-cores 4&amp;nbsp;&lt;BR /&gt;--num-executors 99&lt;BR /&gt;--executor-memory 40G&amp;nbsp;&lt;BR /&gt;--executor-cores 4&amp;nbsp;&lt;BR /&gt;--conf spark.sql.shuffle.partitions=800&lt;BR /&gt;--conf spark.shuffle.compress=true&lt;BR /&gt;--conf spark.port.maxRetries=100&lt;BR /&gt;--conf spark.shuffle.spill.compress=true&lt;BR /&gt;--conf spark.driver.maxResultSize=8g&lt;BR /&gt;--conf spark.broadcast.compress=true&lt;BR /&gt;--conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true&lt;BR /&gt;--conf spark.yarn.executor.memoryOverhead=4G&lt;BR /&gt;--conf spark.hive.mapred.supports.subdirectories=true&lt;BR /&gt;--conf spark.shuffle.io.maxRetries=50&lt;BR /&gt;--conf spark.shuffle.io.retryWait=60s&lt;BR /&gt;--conf spark.reducer.maxReqsInFlight=1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and the python job is taking 5-6hrs to execute. Could some one please suggest me how to tune the job on spark parameter level if possible please guide me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance for your kind support always.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Nov 2022 18:11:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357674#M237643</guid>
      <dc:creator>pankshiv1809</dc:creator>
      <dc:date>2022-11-15T18:11:14Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Submit - Spark Parameter Setting</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357767#M237662</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/100776"&gt;@pankshiv1809&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can review below blogs for tuning spark applications based on your case you need to tune executer,driver memories and cores along with other parameters mentioned in below blog.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/" target="_blank"&gt;https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/" target="_blank"&gt;https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2022 12:20:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357767#M237662</guid>
      <dc:creator>AsimShaikh</dc:creator>
      <dc:date>2022-11-16T12:20:29Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Submit - Spark Parameter Setting</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357772#M237664</link>
      <description>&lt;P&gt;&lt;A class="dcxa-lithium-link" href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/97054" target="_blank" rel="noopener"&gt;AsimShaikh : &lt;STRONG&gt;Okay i will refer the blog and try to tune executer and other parameter.&lt;/STRONG&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2022 12:32:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357772#M237664</guid>
      <dc:creator>pankshiv1809</dc:creator>
      <dc:date>2022-11-16T12:32:14Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Submit - Spark Parameter Setting</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357863#M237685</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/100776"&gt;@pankshiv1809&lt;/a&gt;&amp;nbsp;w&lt;SPAN&gt;as your question&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="c-mrkdwn__highlight"&gt;answered&lt;/SPAN&gt;&lt;SPAN&gt;? Make sure to mark the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class="c-mrkdwn__highlight"&gt;answer&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;as the accepted solution.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;If you find a reply useful, say thanks by clicking on the thumbs up button.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2022 10:09:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Spark-Submit-Spark-Parameter-Setting/m-p/357863#M237685</guid>
      <dc:creator>AsimShaikh</dc:creator>
      <dc:date>2022-11-17T10:09:44Z</dc:date>
    </item>
  </channel>
</rss>

