Support Questions

Find answers, ask questions, and share your expertise

Setting num_executors on Spark2 in Ambari

avatar
Contributor

Where do I find the option to set the number of executors in Ambari?

When running spark shell, I can say --num-executors=46 and run the command. But I want to run spark through beeline and I am not sure where to set this parameter. My cluster has 4 datanodes each having 24 processors making a total of 96 processors in the cluster. I have set SPARK_EXECUTOR_CORES=2 in "advanced spark2-defaults" in ambari and want to set 46 executors.

I tried setting "spark.dynamicAllocation.maxExecutors" to 46 and when I run the query through beeline (client mode), it uses only 47 containers and 24.5% of the cluster.

How can I make the application use all the 96 (or max number) containers and the full cluster?

1 ACCEPTED SOLUTION

avatar

@Sree Kupp

Spark thrift server properties are available at "Advanced spark2-thrift-sparkconf". The property spark.dynamicAllocation.maxExecutors only controls the number of executors in STS.

Looks like your query is only picking 1 core(default) per executor.

To increase number of cores, you can try setting "spark.cores.max" in "Advanced spark2-thrift-sparkconf"(http://spark.apache.org/docs/latest/configuration.html)

View solution in original post

6 REPLIES 6

avatar
Super Guru

@Sree Kupp

Have you tried to set the num_executors=46 in the session variables of the JDBC connection string?

JDBC URL connection string has the following format:

jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>

Try to set <sessionConfs> parameter as:

num_executors=46;

As you know this use is not documented, nor supported by HWX or CDH.

I like to use Hive/LLAP instead:

https://cwiki.apache.org/confluence/display/Hive/LLAP

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_hive-performance-tuning/content/ch_hive_l...

This is supported and it is very promising. Additionally, due to the new features included in HDP 2.6 to be released in the next a few weeks, it will be generally available and a definite option for Enterprise Data Warehouse Optimization and a single pane of SQL on Hadoop, with ANSI SQL 2011 compliance in the very near future.

avatar
Contributor

@Constantin Stanca

Thanks for your time. I am not sure I follow your solution with JDBC URL connection string. Can you please be more specific on what ,sessionConfs> and <hiveConfs> and ,hiveVars> are? Please forgive my ignorance as I just started on this.

Regarding Hive LLAP, yes, I tried that and it gives great results. Just exploring this- sometimes you gotta make your bosses happy 🙂

avatar
Contributor

@Constantin Stanca

Sorry, I think I know what you meant. I connected from beeline using the following:

[sree@alenzapd1nn ~]$
[sree@alenzapd1nn ~]$ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline> !connect jdbc:hive2://alenzapd1snn:10016/mysql;num_executors=46;

Is this what you meant?

This does not work though. The application still uses only 47 containers and 24.5% of the cluster. Please see the attached image.beeline-cluster-usage.png

avatar
Super Guru

@Sree Kupp

I need some clarification. You want to set num_executors to 46, which is a maximum set, but you want to use all the capacity of the cluster which is 96, but you did not mention anything about the RAM allocated to each container. For example, if you have 96 cores and 4 x 128 = 516 GB (simplistically because some cores and memory need to be allocated to other processes running on your cluster), and your memory allocated per container (one core per container) is 8 GB that would require for 96 containers = 96 * 8 = 768. You are short 252 GB which is about round(252/8)=32 containers. That means, even you set num_executors to 96, you could possibly spin 64.

What happened when you set num_executors to 96. On the other hand, even if you set it to 96 it does not mean anything that the maximum is 96. Spark can decide based on several factors, e.g. data locality,

Regarding your 47 vs. 46 set, we need to investigate a bit more what the extra container does.

avatar
Contributor

Thanks @Constantin Stanca It all makes sense now.

avatar

@Sree Kupp

Spark thrift server properties are available at "Advanced spark2-thrift-sparkconf". The property spark.dynamicAllocation.maxExecutors only controls the number of executors in STS.

Looks like your query is only picking 1 core(default) per executor.

To increase number of cores, you can try setting "spark.cores.max" in "Advanced spark2-thrift-sparkconf"(http://spark.apache.org/docs/latest/configuration.html)