I hope you're doing good. I'm seeking your help on queue configuration. We have configured our Capacity Scheduler Queues has per link. we need your help on configuring cluster in such a way that both Larger Query and Smaller Query be executed same time. Today whenever user submits large query which deal with GB of data, causes application to consume all the free resources capacity which blocks other smaller query. How does other big data projects handles this scenario. We have setup two queues, but is their way in Templeton to submit query to particular Queue ?. I have scenario like below
We have two Queues Q1 and Q2 each with 50% of cluster resources. We submit query to Hive through HiveServer2 and WEBHCAT (Templeton). When i submit my query to HiveServer2, it is making use of Q1 Queue capacity using HiveServer2 config. Is there a way or some setting which makes query submitted through WEBHCAT goes to Q2 queue capacity only or is there any command like CURL which can accept parameter i.e WEBHCAT REST API parameter to which queue the query needs to be submitted. because we are seeing one big query block others.. how to improve concurrency?
Thanks in advance
Hi @Mahender S,
Once you specify the queue as suggested Deepesh, you can also enable preemption which makes sure that the application submitted to Q2 runs even though application submitted Q1 occupies whole cluster 's resources.
Capacity schedular preemption setting info can be found below: