Support Questions

Find answers, ask questions, and share your expertise

Hive User Concurrency - Reconciling YARN Capacity Scheduler Fair Sharing Policy and Tez Persistent Queue Design

avatar
Expert Contributor

We've been using the Tez Persistent Queues (Interactive Query Tuning) to optimize Tez Queue performance. Separately we now have Capacity Scheduler "Fair Sharing" policies, which allows separate jobs on the same queue to execute with evenly shared resources.

How should we reconcile the two for optimal Hive configuration? E.g. should we configure Persistent queues when queries are submitted as one user "Hive" and use Fair Sharing policies if the job is submitted by a variety of users?

Any guidance on whether or not we should use the two in combination will help.

258-screen-shot-2015-10-19-at-55937-pm.png

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Wes Floyd

I pinged sme-hive for an answer, @gopal responded with the following statement. Half of all interactive tuning will be replaced by LLAP. Hive 2.0 is days from being released in the Apache.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

Should we configure Persistent queues when queries are submitted as one user "Hive" and use Fair Sharing policies if the job is submitted by a variety of users?

Hi Wes,

This makes sense. Looking at the docs, if there are multiple users (users like u1, u2 , u3 ) hitting the queue then fair share will help "if there is a query running already in a queue and taking up all of the resources, when the second session with a query is introduced, the sessions eventually end up with equal numbers of resources per session. Initially, there is a delay, but if ten queries are run concurrently most of the time, the resources are divided equally among them."

Are you looking for benchmarks or performance numbers for this "fair share vs. non"?

avatar
Expert Contributor

No need for benchmarks or performance numbers. Between using "fair share" and "tez persistent queues" , I'm curious if we should use both techniques in tandem OR understand when we should choose one vs the other?

Perhaps the "fair share" approach is best when trying to reconcile many users sharing resources, then "tez persistent queues" are valuable when absolute lowest latency for queries is the primary goal?

avatar
Master Mentor

@Wes Floyd are you still having issues with this? Can you accept best answer or provide your own solution?

avatar
Expert Contributor

@Artem Ervits - this question has not yet been completely answered.

avatar
Master Mentor
@Wes Floyd

I pinged sme-hive for an answer, @gopal responded with the following statement. Half of all interactive tuning will be replaced by LLAP. Hive 2.0 is days from being released in the Apache.