Support Questions

Find answers, ask questions, and share your expertise

Set maximum containers on a Hive query

avatar
New Contributor

I have a hive insert statement which by default will use all available resources in YARN as it is reading a large volume of data.

 

I am happy for the query to take longer and use less resources so that other users can also have access to compute resources.

 

I don't want to set up YARN queues as this is an unusual query and so don't want to permanently restrict the cluster.

 

If I was using Spark can do this quite easily with setting a number of executors. Is there a hive config that allows me to do this at a query level.

 

I have looked at various other posts such as those below, but nothing seems to allow this.

 https://community.cloudera.com/t5/Support-Questions/How-to-control-number-of-containers-in-a-hive-qu... 

https://community.cloudera.com/t5/Support-Questions/Is-there-a-way-to-set-minimum-maximum-number-of-... 

https://community.cloudera.com/t5/Support-Questions/Can-I-limit-the-number-of-containers-allocated-b... 

 

Also seen this: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query... - but not sure if changing split sizes is a good idea. Would this then impact the structure of data stored by my data.

 

Grateful for any suggestions.

1 ACCEPTED SOLUTION

avatar

Hi @Andyjmoss 

 

As you already pointed https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query...

There is no limit per query, you can only adjust max and min grouping size to play around on mapper tasks.

Would this then impact the structure of data stored by my data?

No this only affects how much data each map task will get.

View solution in original post

2 REPLIES 2

avatar

Hi @Andyjmoss 

 

As you already pointed https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query...

There is no limit per query, you can only adjust max and min grouping size to play around on mapper tasks.

Would this then impact the structure of data stored by my data?

No this only affects how much data each map task will get.

avatar
New Contributor

Thanks @rpathak - having discussed this further amongst our team we think we are going to try setting up elastic YARN queues to help this situation.