Created on 07-07-201611:25 PM - edited 08-17-201911:30 AM
I recently encountered a question where someone asked to see how you can do preemption across YARN queues when a spark job is beyond it's queue's min guarantee. They had seen this before with the Fair Scheduler and Map Reduce, but wanted to apply the same experience here but with Spark and the Capacity Scheduler. This how-to article describes how to setup this experience.
Goal: Run large
Spark jobs in two separate capacity queues to produce an equal share of
resources for both jobs.
Child queue “test1” with a min capacity of 50%
and a max of 100%
Child queue “test2” with a min capacity of 50%
and a max of 100%
Root queue with a fair ordering policy
3. Run Spark jobs
Run Spark job on test1 with a max size container
for as many spark executors as possible
Run Spark job on test2 with a max size
containers using dynamic resource allocation
1) Add YARN preemption
following parameters should be applied to the yarn-site.xml file. This can be done manually or through
Ambari. These are the default preemption properties as provided per Hortonworks documentation.
The following YARN
Preemption Parameters Applied should be applied to yarn-site.xml:
Update the /etc/hadoop/conf/yarn-site.xml with the following
Note: You must put these settings in an xml format.
Option 2: Ambari
To do this in Ambari,
follow the instructions below:
The following parameters were added to yarn-site.xml,
which can be done thru Ambari -> Yarn -> Config. You can turn preemption on in the Settings
tab. This will set yarn.resourcemanager.scheduler.monitor.enable=true.
The remaining properties need to be added in the Advanced
config tab in Ambari under “Custom yarn-site”.
Click “Add Property”. Then add
the following properties:
Run a second Spark job on
test2 queue. Notice how this job does not specify the number of executors. That's because we are using Dynamic Resource Allocation in Spark which became available in Spark 1.6.