Member since
11-22-2021
2
Posts
0
Kudos Received
0
Solutions
11-26-2021
01:25 AM
Thanks @rpathak - having discussed this further amongst our team we think we are going to try setting up elastic YARN queues to help this situation.
... View more
11-22-2021
01:31 AM
I have a hive insert statement which by default will use all available resources in YARN as it is reading a large volume of data.
I am happy for the query to take longer and use less resources so that other users can also have access to compute resources.
I don't want to set up YARN queues as this is an unusual query and so don't want to permanently restrict the cluster.
If I was using Spark can do this quite easily with setting a number of executors. Is there a hive config that allows me to do this at a query level.
I have looked at various other posts such as those below, but nothing seems to allow this.
https://community.cloudera.com/t5/Support-Questions/How-to-control-number-of-containers-in-a-hive-query/td-p/297734
https://community.cloudera.com/t5/Support-Questions/Is-there-a-way-to-set-minimum-maximum-number-of-containers/m-p/190660#M152749
https://community.cloudera.com/t5/Support-Questions/Can-I-limit-the-number-of-containers-allocated-by-Tez/m-p/157020#M119433
Also seen this: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915 - but not sure if changing split sizes is a good idea. Would this then impact the structure of data stored by my data.
Grateful for any suggestions.
... View more
Labels: