Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Configuring YARN queues for Spark notebooks

Solved Go to solution
Highlighted

Configuring YARN queues for Spark notebooks

New Contributor

We're using a capacity scheduler on YARN with several queues. One of the queues is reserved for Spark notebooks (like jupyter/zeppelin). Many of our users leave their notebooks open for days on end. They are not using the resources they claimed (CPU and memory) most of the time.

What would be a good configuration for this use case? Is it possible to configure YARN/Spark in such a way that inactive notebooks do not hinder other users?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Configuring YARN queues for Spark notebooks

@R Pul Yes, that is a common problem. The first thing I would try is at the Spark configuration level, enable Dynamic Resource Allocation. Here is a description (from link below):

"Spark 1.2 introduces the ability to dynamically scale the set of cluster resources allocated to your application up and down based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster. If a subset of the resources allocated to an application becomes idle, it can be returned to the cluster’s pool of resources and acquired by other applications. In Spark, dynamic resource allocation is performed on the granularity of the executor and can be enabled through spark.dynamicAllocation.enabled."

And in particular, the Remove Policy:

The policy for removing executors is much simpler. A Spark application removes an executor when it has been idle for more thanspark.dynamicAllocation.executorIdleTimeout seconds.

Web page:

https://spark.apache.org/docs/1.2.0/job-scheduling.html

Also, check out the paragraph entitled "Graceful Decommission of Executors" for more information.

View solution in original post

1 REPLY 1
Highlighted

Re: Configuring YARN queues for Spark notebooks

@R Pul Yes, that is a common problem. The first thing I would try is at the Spark configuration level, enable Dynamic Resource Allocation. Here is a description (from link below):

"Spark 1.2 introduces the ability to dynamically scale the set of cluster resources allocated to your application up and down based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster. If a subset of the resources allocated to an application becomes idle, it can be returned to the cluster’s pool of resources and acquired by other applications. In Spark, dynamic resource allocation is performed on the granularity of the executor and can be enabled through spark.dynamicAllocation.enabled."

And in particular, the Remove Policy:

The policy for removing executors is much simpler. A Spark application removes an executor when it has been idle for more thanspark.dynamicAllocation.executorIdleTimeout seconds.

Web page:

https://spark.apache.org/docs/1.2.0/job-scheduling.html

Also, check out the paragraph entitled "Graceful Decommission of Executors" for more information.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here