I have a hive insert statement which by default will use all available resources in YARN as it is reading a large volume of data.
I am happy for the query to take longer and use less resources so that other users can also have access to compute resources.
I don't want to set up YARN queues as this is an unusual query and so don't want to permanently restrict the cluster.
If I was using Spark can do this quite easily with setting a number of executors. Is there a hive config that allows me to do this at a query level.
I have looked at various other posts such as those below, but nothing seems to allow this.
Also seen this: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query... - but not sure if changing split sizes is a good idea. Would this then impact the structure of data stored by my data.
Grateful for any suggestions.