Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Limit Hive query ressource consumption in Yarn

avatar

Hi,

 

I face the following problem.

I have a service user say "service_dwh" used for our datawarehouse that queries heavily our data reservoir using hive queries.

I have had some cases where due to the query and/or to missing statistics a single hive query could take 100% of the resources available for the "service_dwh" user.

I couldn't find a way using capacity scheduler, queues and user limit factor to prevent a single application from taking all the resources for a very long time.

Traditional DBMS have mechanism that throttle job resources based of the job duration.

That way a long (big) job can't monopolyze ressources for new and potentially shorter jobs for too long.

3 REPLIES 3

avatar
Expert Contributor

avatar

Hi tj,

I knew already f this options but sadly as mentionned in my post all the queries are ran by the same service user.

Hence I have no way to use the user limit factor.

I would like to have  a query limit factor in hive or some way to prevent one query to use too much capacity even if available.

Gael

avatar
Master Collaborator

Why not using the resource pool and sub pool, if it specific query then pass the resource pool for this query and create resource pool or subpool for this query