Support Questions

gael__urbauer · ‎08-22-2023

Hi,

I face the following problem.

I have a service user say "service_dwh" used for our datawarehouse that queries heavily our data reservoir using hive queries.

I have had some cases where due to the query and/or to missing statistics a single hive query could take 100% of the resources available for the "service_dwh" user.

I couldn't find a way using capacity scheduler, queues and user limit factor to prevent a single application from taking all the resources for a very long time.

Traditional DBMS have mechanism that throttle job resources based of the job duration.

That way a long (big) job can't monopolyze ressources for new and potentially shorter jobs for too long.

tj2007 · ‎09-05-2023

Hi @gael__urbauer

Please refer to the below articles and see if this is what you are looking for:

[1] https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/yarn-allocate-resources/topics/yarn-configure... [2] https://blog.cloudera.com/yarn-capacity-scheduler/#:~:text=User%20Limit%20Factor%20is%20a,minimum%20....

Let me know if this helps.

Cheers!

gael__urbauer · ‎09-07-2023

Hi tj,

I knew already f this options but sadly as mentionned in my post all the queries are ran by the same service user.

Hence I have no way to use the user limit factor.

I would like to have a query limit factor in hive or some way to prevent one query to use too much capacity even if available.

Gael

Fawze · ‎10-19-2023

Why not using the resource pool and sub pool, if it specific query then pass the resource pool for this query and create resource pool or subpool for this query

Cloudera Community

Support Questions

Limit Hive query ressource consumption in Yarn