<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Setting num_executors on Spark2 in Ambari in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181132#M58582</link>
    <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/14345/sreekuppa.html"&gt;Sree Kupp&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Have you tried to set the num_executors=46 in the session variables of the JDBC connection string?&lt;/P&gt;&lt;P&gt;JDBC URL connection string has the following format:&lt;/P&gt;&lt;PRE&gt;jdbc:hive2://&amp;lt;host&amp;gt;:&amp;lt;port&amp;gt;/&amp;lt;dbName&amp;gt;;&amp;lt;sessionConfs&amp;gt;?&amp;lt;hiveConfs&amp;gt;#&amp;lt;hiveVars&amp;gt;&lt;/PRE&gt;&lt;P&gt;Try to set &amp;lt;sessionConfs&amp;gt; parameter as:&lt;/P&gt;&lt;PRE&gt;num_executors=46;&lt;/PRE&gt;&lt;P&gt;As you know this use is not documented, nor supported by HWX or CDH.&lt;/P&gt;&lt;P&gt;I like to use Hive/LLAP instead:&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LLAP" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LLAP&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This is supported and it is very promising. Additionally, due to the new features included in HDP 2.6 to be released in the next a few weeks, it will be generally available and a definite option for Enterprise Data Warehouse Optimization and a single pane of SQL on Hadoop, with ANSI SQL 2011 compliance in the very near future.&lt;/P&gt;</description>
    <pubDate>Thu, 30 Mar 2017 22:34:54 GMT</pubDate>
    <dc:creator>cstanca</dc:creator>
    <dc:date>2017-03-30T22:34:54Z</dc:date>
    <item>
      <title>Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181131#M58581</link>
      <description>&lt;P&gt;Where do I find the option to set the number of executors in Ambari?&lt;/P&gt;&lt;P&gt;When running spark shell, I can say --num-executors=46 and run the command. But I want to run spark through beeline and I am not sure where to set this parameter. My cluster has 4 datanodes each having 24 processors making a total of 96 processors in the cluster. I have set SPARK_EXECUTOR_CORES=2 in "advanced spark2-defaults" in ambari and want to set 46 executors. &lt;/P&gt;&lt;P&gt;I tried setting "spark.dynamicAllocation.maxExecutors" to 46 and when I run the query through beeline (client mode), it uses only 47 containers and 24.5% of the cluster. &lt;/P&gt;&lt;P&gt;How can I make the application use all the 96 (or max number) containers and the full cluster?&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 21:51:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181131#M58581</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2017-03-30T21:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181132#M58582</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/14345/sreekuppa.html"&gt;Sree Kupp&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Have you tried to set the num_executors=46 in the session variables of the JDBC connection string?&lt;/P&gt;&lt;P&gt;JDBC URL connection string has the following format:&lt;/P&gt;&lt;PRE&gt;jdbc:hive2://&amp;lt;host&amp;gt;:&amp;lt;port&amp;gt;/&amp;lt;dbName&amp;gt;;&amp;lt;sessionConfs&amp;gt;?&amp;lt;hiveConfs&amp;gt;#&amp;lt;hiveVars&amp;gt;&lt;/PRE&gt;&lt;P&gt;Try to set &amp;lt;sessionConfs&amp;gt; parameter as:&lt;/P&gt;&lt;PRE&gt;num_executors=46;&lt;/PRE&gt;&lt;P&gt;As you know this use is not documented, nor supported by HWX or CDH.&lt;/P&gt;&lt;P&gt;I like to use Hive/LLAP instead:&lt;/P&gt;&lt;P&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LLAP" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LLAP&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html" target="_blank"&gt;http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_llap.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This is supported and it is very promising. Additionally, due to the new features included in HDP 2.6 to be released in the next a few weeks, it will be generally available and a definite option for Enterprise Data Warehouse Optimization and a single pane of SQL on Hadoop, with ANSI SQL 2011 compliance in the very near future.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 22:34:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181132#M58582</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-03-30T22:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181133#M58583</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;@Constantin Stanca&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;&lt;/A&gt;Thanks for your time. I am not sure I follow your solution with JDBC URL connection string. Can you please be more specific on what ,sessionConfs&amp;gt; and &amp;lt;hiveConfs&amp;gt; and ,hiveVars&amp;gt; are? Please forgive my ignorance as I just started on this.
&lt;/P&gt;&lt;P&gt;Regarding Hive LLAP, yes, I tried that and it gives great results. Just exploring this- sometimes you gotta make your bosses happy &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 23:40:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181133#M58583</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2017-03-30T23:40:51Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181134#M58584</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;@Constantin Stanca&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;&lt;/A&gt;Sorry, I think I know what you meant. I connected from beeline using the following:&lt;/P&gt;&lt;PRE&gt;[sree@alenzapd1nn ~]$
[sree@alenzapd1nn ~]$ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline&amp;gt; !connect jdbc:hive2://alenzapd1snn:10016/mysql;num_executors=46;
&lt;/PRE&gt;&lt;P&gt;Is this what you meant? &lt;/P&gt;&lt;P&gt;This does not work though. The application still uses only 47 containers and 24.5% of the cluster. Please see the attached image.&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/14224-beeline-cluster-usage.png"&gt;beeline-cluster-usage.png&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 23:50:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181134#M58584</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2017-03-30T23:50:59Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181135#M58585</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/14345/sreekuppa.html"&gt;@Sree Kupp&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I need some clarification. You want to set num_executors to 46, which is a maximum set, but you want to use all the capacity of the cluster which is 96, but you did not mention anything about the RAM allocated to each container. For example, if you have 96 cores and 4 x 128 = 516 GB (simplistically because some cores and memory need to be allocated to other processes running on your cluster), and your memory allocated per container (one core per container) is 8 GB that would require for 96 containers = 96 * 8 = 768. You are short 252 GB which is about round(252/8)=32 containers. That means, even you set num_executors to 96, you could possibly spin 64. &lt;/P&gt;&lt;P&gt;What happened when you set num_executors to 96. On the other hand, even if you set it to 96 it does not mean anything that the maximum is 96. Spark can decide based on several factors, e.g. data locality,  &lt;/P&gt;&lt;P&gt;Regarding your 47 vs. 46 set, we need to investigate a bit more what the extra container does. &lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 00:27:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181135#M58585</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-03-31T00:27:24Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181136#M58586</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14345/sreekuppa.html" nodeid="14345"&gt;@Sree Kupp&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Spark thrift server properties are available at "Advanced spark2-thrift-sparkconf". The property spark.dynamicAllocation.maxExecutors only controls the number of executors in STS. &lt;/P&gt;&lt;P&gt;Looks like your query is only picking 1 core(default) per executor. &lt;/P&gt;&lt;P&gt;To increase number of cores, you can try setting "spark.cores.max" in "Advanced spark2-thrift-sparkconf"(http://spark.apache.org/docs/latest/configuration.html) &lt;/P&gt;</description>
      <pubDate>Fri, 31 Mar 2017 02:37:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181136#M58586</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2017-03-31T02:37:24Z</dc:date>
    </item>
    <item>
      <title>Re: Setting num_executors on Spark2 in Ambari</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181137#M58587</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/3486/cstanca.html" nodeid="3486"&gt;@Constantin Stanca&lt;/A&gt; It all makes sense now.&lt;/P&gt;</description>
      <pubDate>Tue, 11 Apr 2017 00:54:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Setting-num-executors-on-Spark2-in-Ambari/m-p/181137#M58587</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2017-04-11T00:54:44Z</dc:date>
    </item>
  </channel>
</rss>

