<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to decide spark submit configurations in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226199#M75173</link>
    <description>&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/70180/eduvikassri.html" nodeid="70180"&gt;@Vikas Srivastava &lt;/A&gt;for your inputs but i would like to know how my input data size will affect my configuration.considering we will have other jobs also running in cluster and i want to use enough configuration for my 2GB input only.&lt;/P&gt;</description>
    <pubDate>Sun, 04 Mar 2018 12:56:35 GMT</pubDate>
    <dc:creator>pawaranand011</dc:creator>
    <dc:date>2018-03-04T12:56:35Z</dc:date>
    <item>
      <title>How to decide spark submit configurations</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226197#M75171</link>
      <description>&lt;P&gt;I want to know how shall i decide upon the --executor-cores,--executor-memory,--num-executors considering i have cluster configuration as : 40 Nodes,20 cores each,100GB each.&lt;/P&gt;&lt;P&gt;I have a data in file of 2GB size and performing filter and aggregation function.&lt;/P&gt;&lt;P&gt;How much value should be given to parameters for --spark-submit command and how will it work.&lt;/P&gt;&lt;P&gt;(I don't want to use dynamic memory allocation for this particular case) &lt;/P&gt;</description>
      <pubDate>Thu, 01 Mar 2018 11:24:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226197#M75171</guid>
      <dc:creator>pawaranand011</dc:creator>
      <dc:date>2018-03-01T11:24:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to decide spark submit configurations</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226198#M75172</link>
      <description>&lt;P&gt;Number of cores = Concurrent tasks as executor can run&lt;/P&gt;&lt;P&gt;So we might think, more concurrent tasks for each executor will give better performance. But research shows that
any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.&lt;/P&gt;&lt;P&gt;Coming back to next step, with 5 as cores per executor, and 19 as total available cores in one Node(CPU) - we come to 
~4 executors per node.&lt;/P&gt;&lt;P&gt;So memory for each executor is 98/4 = ~24GB.&lt;/P&gt;&lt;P&gt;Calculating that overhead - .07 * 24 (Here 24 is calculated as above)
                            = 1.68&lt;/P&gt;&lt;P&gt;Since 1.68 GB &amp;gt; 384 MB, the over head is 1.68. &lt;/P&gt;&lt;P&gt;Take the above from each 21 above =&amp;gt; 24 - 1.68 ~ 22 GB&lt;/P&gt;</description>
      <pubDate>Fri, 02 Mar 2018 02:51:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226198#M75172</guid>
      <dc:creator>edu_vikassri</dc:creator>
      <dc:date>2018-03-02T02:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to decide spark submit configurations</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226199#M75173</link>
      <description>&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/70180/eduvikassri.html" nodeid="70180"&gt;@Vikas Srivastava &lt;/A&gt;for your inputs but i would like to know how my input data size will affect my configuration.considering we will have other jobs also running in cluster and i want to use enough configuration for my 2GB input only.&lt;/P&gt;</description>
      <pubDate>Sun, 04 Mar 2018 12:56:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226199#M75173</guid>
      <dc:creator>pawaranand011</dc:creator>
      <dc:date>2018-03-04T12:56:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to decide spark submit configurations</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226200#M75174</link>
      <description>&lt;P&gt;In your case, if you try to run it on yarn, you can use the minimum of 1G as well like this&lt;/P&gt;&lt;P&gt;--master yarn-client --executor-memory 1G --executor-cores 2 --num-executors 12 &lt;BR /&gt;you can increase the number of executors to make it more better &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Mar 2018 23:13:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/226200#M75174</guid>
      <dc:creator>edu_vikassri</dc:creator>
      <dc:date>2018-03-04T23:13:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to decide spark submit configurations</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/286859#M75175</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this below links helps in deciding the Configurations apart from the previous comments&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/" target="_blank" rel="noopener noreferrer noopener noreferrer nofollow noopener noreferrer"&gt;https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/" target="_blank" rel="noopener noreferrer noopener noreferrer nofollow noopener noreferrer"&gt;https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;AKR&lt;/P&gt;</description>
      <pubDate>Sun, 05 Jan 2020 14:17:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-decide-spark-submit-configurations/m-p/286859#M75175</guid>
      <dc:creator>AKR</dc:creator>
      <dc:date>2020-01-05T14:17:41Z</dc:date>
    </item>
  </channel>
</rss>

