Support Questions
Find answers, ask questions, and share your expertise

Spark - YARN Capacity Scheduler

Contributor

Can we configure the capacity scheduler in such a way that a spark job only runs when it can procure enough resources?

In the current FIFO setup a spark job will start running if it can get a few of the required executors, but the job will fail because it couldn't get enough resources.

I would like the spark job to only start when it can procure all the required resources.

1 ACCEPTED SOLUTION

Contributor

Thanks Ravi,

I had to:

1) Copy spark shuffle jars to nodemanager classpaths on all nodes

2) add spark_shuffle to yarn.nodemanager.aux-services, set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService in yarn-site.xml (via Ambari)

3) Restart all nodemanagers

4) Add the following to spark-defaults.conf

spark.dynamicAllocation.enabled true

spark.shuffle.service.enabled true

5) Set these parameters per job basis

spark.dynamicAllocation.initialExecutors=#

spark.dynamicAllocation.minExecutors=#

View solution in original post

8 REPLIES 8

Rising Star
@Nasheb Ismaily

You might need to set the minimum-user-limit-percent (say 30%)

yarn.scheduler.capacity.root.support.services.minimum-user- limit-percent

Unless the 30% of the queue capacity is available , the job will not start.

Contributor

Thank you ,I'll try this out

Guru

This will only work at user level, not at job level. So, if the user has other jobs and he gets the % of queue, spark job will start even before it can get that.

Rising Star

I agree @Ravi Mutyala

Guru

If you are not using dynamic allocation, your job that is submitted will not start until it gets all the resources. You are asking for N number of executors, so YARN will not let you proceed until you get all executors.

If you are using dynamic allocation, then setting spark.dynamicAllocation.minExecutors to a higher value will mean that the job gets scheduled only if minExecutors are met.

Contributor

Thanks Ravi, this is very close to what I need.

Question, spark.dynamicAllocation.minExecutors seems to be a global property in spark-defautls

Is there a way to set this property on a job by job basis?

Spark job1 -> min executors 8

Spark job2 -> min executors 5

Guru

I think its spark.dynamicAllocation.initialExecutors that you can set per job. Try putting in a property file and passing it with --properties-file. Haven't tried this myself, so let me know how it works.

Contributor

Thanks Ravi,

I had to:

1) Copy spark shuffle jars to nodemanager classpaths on all nodes

2) add spark_shuffle to yarn.nodemanager.aux-services, set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService in yarn-site.xml (via Ambari)

3) Restart all nodemanagers

4) Add the following to spark-defaults.conf

spark.dynamicAllocation.enabled true

spark.shuffle.service.enabled true

5) Set these parameters per job basis

spark.dynamicAllocation.initialExecutors=#

spark.dynamicAllocation.minExecutors=#

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.