Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to make Yarn dynamically allocate resources for Spark

Highlighted

Not able to make Yarn dynamically allocate resources for Spark

Explorer

Hi everyone,

have a cluster managed with Yarn and runs Spark jobs, the components were installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6 cores. I use Fair scheduler

I want Yarn to automatically add/remove executor cores, but no matter what I do it doesn't work

Relevant Spark configuration (configured in Ambari):

<code>spark.dynamicAllocation.schedu<wbr>lerBacklogTimeout 10s
spark.dynamicAllocation.sustai<wbr>nedSchedulerBacklogTimeout 5s
spark.driver.memory 4G
spark.dynamicAllocation.enable<wbr>d true
spark.dynamicAllocation.initia<wbr>lExecutors 6 (has no effect - starts with 2)
spark.dynamicAllocation.maxExe<wbr>cutors 10
spark.dynamicAllocation.minExe<wbr>cutors 1
spark.scheduler.mode FAIR
spark.shuffle.service.enabled true
SPARK_EXECUTOR_MEMORY="36G"

Relevant Yarn configuration (configured in Ambari):

<code>yarn.nodemanager.aux-services mapreduce_shuffle,spark_shuffl<wbr>e,spark2_shuffle
YARN Java heap size 4096
yarn.resourcemanager.scheduler<wbr>.class org.apache.hadoop.yarn.server.<wbr>resourcemanager.scheduler.fair<wbr>.FairScheduler
yarn.scheduler.fair.preemption true
yarn.nodemanager.aux-services.<wbr>spark2_shuffle.class org.apache.spark.network.yarn.<wbr>YarnShuffleService 
yarn.nodemanager.aux-services.<wbr>spark2_shuffle.classpath {{stack_root}}/${hdp.version}/<wbr>spark2/aux/*
yarn.nodemanager.aux-services.<wbr>spark_shuffle.class org.apache.spark.network.yarn.<wbr>YarnShuffleService
yarn.nodemanager.aux-services.<wbr>spark_shuffle.classpath {{stack_root}}/${hdp.version}/<wbr>spark/aux/*
Minimum Container Size (VCores) 0
Maximum Container Size (VCores) 12 
Number of virtual cores 12

Also I followed the manual in and passed all the steps to configure external shuffle service, I copied the yarn-shuffle jar:

cp /usr/hdp/2.6.3.0-235/spark/aux/spark-2.2.0.2.6.3.0-235-yarn-shuffle.jar /usr/hdp/2.6.3.0-235/hadoop-yarn/lib/

I see only 3 cores are allocated to the application (default executors is 2 so I guess its 2+driver) screenshot from the queue is attached, although many tasks are pending (screenshot added).

I want to get to a point where Yarn starts with 3 cpu for every application, but when there are pending tasks more resources are allocated.

If it it relevant, I use Jupyter Notebook and findspark to connect to the cluster:
import findspark
findspark.init()
spark = SparkSession.builder.appName("internal-external2").getOrCreate()


I would really appreciate any suggestion/help, there is no manual on that topic I didn't try.
thx a lot,
Anton

83490-queue.png

83491-tasks.png

2 REPLIES 2
Highlighted

Re: Not able to make Yarn dynamically allocate resources for Spark

@Anton P

Please double check that the number of executor instances has not been set in spark conf, command line or code. If set this will disable the dynamic allocation feature. From the Spark UI -> Environment tab check if the executor instances is set to a number.

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: Not able to make Yarn dynamically allocate resources for Spark

Explorer

Hi @Felix Albani and thx for the answer,

I have double checked and no executor instances configuration is set. More than that, no memory, cores or any such configuration is set for executor or driver.

I do want to repeat that I run the calculations from Jupyter Notebook, and it feels like the allocation is the default configuration.

I have tried running a program with spark-submit, and the results are the same

Don't have an account?
Coming from Hortonworks? Activate your account here