Created 05-25-2016 07:09 PM
Can we configure the capacity scheduler in such a way that a spark job only runs when it can procure enough resources?
In the current FIFO setup a spark job will start running if it can get a few of the required executors, but the job will fail because it couldn't get enough resources.
I would like the spark job to only start when it can procure all the required resources.
Created 06-01-2016 12:48 AM
Thanks Ravi,
I had to:
1) Copy spark shuffle jars to nodemanager classpaths on all nodes
2) add spark_shuffle to yarn.nodemanager.aux-services, set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService in yarn-site.xml (via Ambari)
3) Restart all nodemanagers
4) Add the following to spark-defaults.conf
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
5) Set these parameters per job basis
spark.dynamicAllocation.initialExecutors=#
spark.dynamicAllocation.minExecutors=#
Created 05-25-2016 07:25 PM
You might need to set the minimum-user-limit-percent (say 30%)
yarn.scheduler.capacity.root.support.services.minimum-user- limit-percent
Unless the 30% of the queue capacity is available , the job will not start.
Created 05-25-2016 07:27 PM
Thank you ,I'll try this out
Created 05-25-2016 07:31 PM
This will only work at user level, not at job level. So, if the user has other jobs and he gets the % of queue, spark job will start even before it can get that.
Created 05-25-2016 09:20 PM
I agree @Ravi Mutyala
Created 05-25-2016 09:04 PM
If you are not using dynamic allocation, your job that is submitted will not start until it gets all the resources. You are asking for N number of executors, so YARN will not let you proceed until you get all executors.
If you are using dynamic allocation, then setting spark.dynamicAllocation.minExecutors to a higher value will mean that the job gets scheduled only if minExecutors are met.
Created 05-25-2016 09:19 PM
Thanks Ravi, this is very close to what I need.
Question, spark.dynamicAllocation.minExecutors seems to be a global property in spark-defautls
Is there a way to set this property on a job by job basis?
Spark job1 -> min executors 8
Spark job2 -> min executors 5
Created 05-25-2016 09:41 PM
I think its spark.dynamicAllocation.initialExecutors that you can set per job. Try putting in a property file and passing it with --properties-file. Haven't tried this myself, so let me know how it works.
Created 06-01-2016 12:48 AM
Thanks Ravi,
I had to:
1) Copy spark shuffle jars to nodemanager classpaths on all nodes
2) add spark_shuffle to yarn.nodemanager.aux-services, set yarn.nodemanager.aux-services.spark_shuffle.class to org.apache.spark.network.yarn.YarnShuffleService in yarn-site.xml (via Ambari)
3) Restart all nodemanagers
4) Add the following to spark-defaults.conf
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
5) Set these parameters per job basis
spark.dynamicAllocation.initialExecutors=#
spark.dynamicAllocation.minExecutors=#