Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is there a way to set minimum/maximum number of containers for a application?

avatar
Explorer

Is there a way to set minimum/maximum number of containers for an application? What I am observing in my cluster is that, when YARN try to submit a job. It pretty much put all the available resources to a given job depending on the queue setting. When the next jobs comes in to the queue let’s say after 5 sec, it tries to find out how much resource are available. At this point of time, when all the resources are given to the 1st job which is still running, it allots minimum allocation set-up on the cluster. This create a large gap between 2 jobs, mean the same work load was submitted to the cluster. The 1st jobs completed may be in 10 minutes because it got lot of resource. But the 2nd job when it came in got minimum allocation took 3 hours to complete. I am trying to avoid such big gap of execution time.

1 ACCEPTED SOLUTION

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Expert Contributor

@Khaja Hussain

Please find the attached Yarn Queue manager screen snap for example.

1. You can control the resources by having separate queues for different applications with that you can control resources spent at the queue.

2. Within a Single queue , you can mention the "Minimum User limit %" to control a single user resource %. Also you can choose ordering policy as "Fair" instead of "FIFO"(First in first out)

3. You can also control maximum % of resources a single queue can take out of the total cluster capacity.

Hope the attached screen snap helps.

yarn-queue-config-to-control-min-max.jpg

avatar
Explorer
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

Khaja Hussain

You haven't mentioned time the workload timings when they run separately.

For example, Can you run Jan 2017 data processing separately and record the number of containers it requires and the time it takes (10 min)

The same way repeat for Feb 2017 data processing separately(12 min). Compare the number of containers each job demands.

Then set the order policy as FAIR and set Minimum user limit to 50% for the queue, and run both the jobs parallely. Now the resource allocation should be equally distributed and observe the increased run time for the jobs as the number of containers available for each job were reduced.

Also please refer the article on user-limit-factor in a yarn queue,

https://community.hortonworks.com/content/supportkb/49640/what-does-the-user-limit-factor-do-when-us...

Accept the answer if this helps for your query.