Support Questions

Find answers, ask questions, and share your expertise

Is there a way to set minimum/maximum number of containers for a application?

avatar
Explorer

Is there a way to set minimum/maximum number of containers for an application? What I am observing in my cluster is that, when YARN try to submit a job. It pretty much put all the available resources to a given job depending on the queue setting. When the next jobs comes in to the queue let’s say after 5 sec, it tries to find out how much resource are available. At this point of time, when all the resources are given to the 1st job which is still running, it allots minimum allocation set-up on the cluster. This create a large gap between 2 jobs, mean the same work load was submitted to the cluster. The 1st jobs completed may be in 10 minutes because it got lot of resource. But the 2nd job when it came in got minimum allocation took 3 hours to complete. I am trying to avoid such big gap of execution time.

1 ACCEPTED SOLUTION

avatar
Explorer

Thanks Rajesh for the response. I will take this back to the table and make few adjustment to see if it works. One other thing I want to clarify like we have 3 queue, HIGH, MEDIUM & LOW. If the user submits the a work load (example data processing for Jan 2017) in the medium queue and same work load(Feb 2017) is submitted by the user after few secs while the 1st job is running , I want the execution time to be very similar. What I am seeing is vast difference in execution time, which is what I am trying to avoid here. The scenarios are happening within the queue. I think order policy as FAIR in this instance should fetch me the similar execution time.

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

@Khaja Hussain

Please find the attached Yarn Queue manager screen snap for example.

1. You can control the resources by having separate queues for different applications with that you can control resources spent at the queue.

2. Within a Single queue , you can mention the "Minimum User limit %" to control a single user resource %. Also you can choose ordering policy as "Fair" instead of "FIFO"(First in first out)

3. You can also control maximum % of resources a single queue can take out of the total cluster capacity.

Hope the attached screen snap helps.

yarn-queue-config-to-control-min-max.jpg

avatar
Explorer

Thanks Rajesh for the response. I will take this back to the table and make few adjustment to see if it works. One other thing I want to clarify like we have 3 queue, HIGH, MEDIUM & LOW. If the user submits the a work load (example data processing for Jan 2017) in the medium queue and same work load(Feb 2017) is submitted by the user after few secs while the 1st job is running , I want the execution time to be very similar. What I am seeing is vast difference in execution time, which is what I am trying to avoid here. The scenarios are happening within the queue. I think order policy as FAIR in this instance should fetch me the similar execution time.

avatar
Expert Contributor

Khaja Hussain

You haven't mentioned time the workload timings when they run separately.

For example, Can you run Jan 2017 data processing separately and record the number of containers it requires and the time it takes (10 min)

The same way repeat for Feb 2017 data processing separately(12 min). Compare the number of containers each job demands.

Then set the order policy as FAIR and set Minimum user limit to 50% for the queue, and run both the jobs parallely. Now the resource allocation should be equally distributed and observe the increased run time for the jobs as the number of containers available for each job were reduced.

Also please refer the article on user-limit-factor in a yarn queue,

https://community.hortonworks.com/content/supportkb/49640/what-does-the-user-limit-factor-do-when-us...

Accept the answer if this helps for your query.