Created on 08-01-2018 06:45 AM - edited 08-17-2019 10:20 PM
Hi everyone,
have a cluster managed with Yarn and runs Spark jobs, the components
were installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6
cores. I use Fair scheduler
I want Yarn to automatically add/remove executor cores, but no matter what I do it doesn't work
Relevant Spark configuration (configured in Ambari):
<code>spark.dynamicAllocation.schedu<wbr>lerBacklogTimeout 10s spark.dynamicAllocation.sustai<wbr>nedSchedulerBacklogTimeout 5s spark.driver.memory 4G spark.dynamicAllocation.enable<wbr>d true spark.dynamicAllocation.initia<wbr>lExecutors 6 (has no effect - starts with 2) spark.dynamicAllocation.maxExe<wbr>cutors 10 spark.dynamicAllocation.minExe<wbr>cutors 1 spark.scheduler.mode FAIR spark.shuffle.service.enabled true SPARK_EXECUTOR_MEMORY="36G"
Relevant Yarn configuration (configured in Ambari):
<code>yarn.nodemanager.aux-services mapreduce_shuffle,spark_shuffl<wbr>e,spark2_shuffle YARN Java heap size 4096 yarn.resourcemanager.scheduler<wbr>.class org.apache.hadoop.yarn.server.<wbr>resourcemanager.scheduler.fair<wbr>.FairScheduler yarn.scheduler.fair.preemption true yarn.nodemanager.aux-services.<wbr>spark2_shuffle.class org.apache.spark.network.yarn.<wbr>YarnShuffleService yarn.nodemanager.aux-services.<wbr>spark2_shuffle.classpath {{stack_root}}/${hdp.version}/<wbr>spark2/aux/* yarn.nodemanager.aux-services.<wbr>spark_shuffle.class org.apache.spark.network.yarn.<wbr>YarnShuffleService yarn.nodemanager.aux-services.<wbr>spark_shuffle.classpath {{stack_root}}/${hdp.version}/<wbr>spark/aux/* Minimum Container Size (VCores) 0 Maximum Container Size (VCores) 12 Number of virtual cores 12
Also I followed the manual in and passed all the steps to configure external shuffle service, I copied the yarn-shuffle jar:
cp /usr/hdp/2.6.3.0-235/spark/aux/spark-2.2.0.2.6.3.0-235-yarn-shuffle.jar /usr/hdp/2.6.3.0-235/hadoop-yarn/lib/
I see only 3 cores are allocated to the application (default executors is 2 so I guess its 2+driver) screenshot from the queue is attached, although many tasks are pending (screenshot added).
I want to get to a point where Yarn starts with 3 cpu for every application, but when there are pending tasks more resources are allocated.
If it it relevant, I use Jupyter Notebook and findspark to connect to the cluster:
import findspark
findspark.init()
spark = SparkSession.builder.appName("internal-external2").getOrCreate()
I would really appreciate any suggestion/help, there is no manual on that topic I didn't try.
thx a lot,
Anton
Created 08-01-2018 01:46 PM
Please double check that the number of executor instances has not been set in spark conf, command line or code. If set this will disable the dynamic allocation feature. From the Spark UI -> Environment tab check if the executor instances is set to a number.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 08-02-2018 06:29 AM
Hi @Felix Albani and thx for the answer,
I have double checked and no executor instances configuration is set. More than that, no memory, cores or any such configuration is set for executor or driver.
I do want to repeat that I run the calculations from Jupyter Notebook, and it feels like the allocation is the default configuration.
I have tried running a program with spark-submit, and the results are the same