We've been using the Tez Persistent Queues (Interactive Query Tuning) to optimize Tez Queue performance. Separately we now have Capacity Scheduler "Fair Sharing" policies, which allows separate jobs on the same queue to execute with evenly shared resources.
How should we reconcile the two for optimal Hive configuration? E.g. should we configure Persistent queues when queries are submitted as one user "Hive" and use Fair Sharing policies if the job is submitted by a variety of users?
Any guidance on whether or not we should use the two in combination will help.
Should we configure Persistent queues when queries are submitted as one user "Hive" and use Fair Sharing policies if the job is submitted by a variety of users?
This makes sense. Looking at the docs, if there are multiple users (users like u1, u2 , u3 ) hitting the queue then fair share will help "if there is a query running already in a queue and taking up all of the resources, when the second session with a query is introduced, the sessions eventually end up with equal numbers of resources per session. Initially, there is a delay, but if ten queries are run concurrently most of the time, the resources are divided equally among them."
Are you looking for benchmarks or performance numbers for this "fair share vs. non"?
No need for benchmarks or performance numbers. Between using "fair share" and "tez persistent queues" , I'm curious if we should use both techniques in tandem OR understand when we should choose one vs the other?
Perhaps the "fair share" approach is best when trying to reconcile many users sharing resources, then "tez persistent queues" are valuable when absolute lowest latency for queries is the primary goal?