i have the following requirement , i hope it is possible:
i have approx 300 allocated cores in the grid (i am now doing a fresh install on latest version on virtual servers)
let say i have 5 users,each can take 100% resources of the grid.
when only one user running it take 100% (ok)
when another user submit a job i want them to share approx 50% each, but when the preemption is enabled some of the jobs are preempted and killed (i want that the killed jobs to return the the pool)
or another solution:
wait for the job to complete (might take up to 5 minutes) and then relocate the resource to another user. ( this is prefered )
I am assuming you have five different Capacity Scheduler queues setup giving each a guaranteed 20% with bursting to 100% (probably better to be a bit under that such as 90% to limit preemption thrashing that could occur in some scenarios). The preemption configuration describes sounds like it is working, but it would only take away 20% of user1's usage so that user2 could get its guarantee of 20%.
There are more fine-grained user limit factors within the queues, but it doesn't sounds like what you want either which I understand to mean if 1 user (100%), 2 users (50% each), 3 users (33%), 4 users (25%) and 5 users (20%).
With the queue setup I suggested of 20-100% then you would get something like if 1 user (100%), add 2nd user then (user1-80%, user2-20%), add 3rd user (user1-60%, user2-20%, user3-20%), and on down to when adding the 5th user they all get 20%. Of course, that assumes a LOT of things. Generally speaking, the queue has a min guarantee and preemption can help you get hold of that when other queues are consuming more than their min.
I do not believe this is a way to get the time-delayed release of resources you asked for above.
generally i am trying to prevent error state of preempted job and put it back to pull, or even better let it finish and when the job finish free the container so it can be allocated for another user or another queue