Created on 06-15-2018 08:52 AM - edited 09-16-2022 06:21 AM
We have configured Impala admission control with a memory limit of 460 GB for a specific pool. However, we have noticed that a specific query was using memory way more than this.
++++++++++
SELECT
++++++
Ideally, it should fail once the aggregate memory for that pool croses 460 but here it seems like it failed once the total cluster memory got exhausted. For your advice.
Created 06-15-2018 11:08 AM
Did you set a default query memory limit for the pool? If you didn't, then there's no enforcement of memory consumption.
Created 06-20-2018 07:43 AM
No. I have only set "Max Memory".
In the cloudera doc it says,
++++++
Note: If you specify Max Memory for an Impala dynamic resource pool, you must also specify the Default Query Memory Limit. Max Memory relies on the Default Query Memory Limit to produce a reliable estimate of overall memory consumption for a query.
+++++
Is this what you meant?
Created 06-20-2018 11:11 AM
Yes, exactly. If you want enforcement of memory consumption then that field needs to have a non-zero value.
We're aware this could be easier and more intuitive. We're currently working on some improvements in this area.
Created 06-21-2018 06:10 AM
Thanks, Tim for your reply.
However, if set, Impala requests this amount of memory from each node, and the query does not proceed until that much memory is available. This can cause query failures since memory required for queries will vary from query to query.
A query can fail if specified memory is not available in the nodes even if it requires less memory than that.
Created 06-21-2018 08:08 AM
Also, if I am using mem_lim query option while running query, will it bypass the "max mem" set at admission control setting? For example, I have set 400GB in max_mem and using mem_lim as 450 while running the query. Another issue, if mem_lim is set at pool level, the number of queries that can be executed will be reduced right? Since mem_lim amount of RAM will be reserved for each query.
Created 06-21-2018 11:00 AM
> Also, if I am using mem_lim query option while running query, will it bypass the "max mem" set at admission control setting? For example, I have set 400GB in max_mem and using mem_lim as 450 while running the query.
The memory limit does not override the pool "Max Memory" - admission control won't admit queries if the total of their memory limit across all hosts exceeds the pool max memory. mem_limit is a per-host number while "Max Memory" is a cluster-wide number.
> Another issue, if mem_lim is set at pool level, the number of queries that can be executed will be reduced right? Since mem_lim amount of RAM will be reserved for each query.
Yeah, there's a trade-off between admitting more queries and reliably giving each query enough memory to run fast. One thing to keep in mind is that running more queries concurrently doesn't mean higher through - if you are running enough queries to max out CPU or disk, then admitting more concurrently won't improve throughput.
> However, if set, Impala requests this amount of memory from each node, and the query does not proceed until that much memory is available. This can cause query failures since memory required for queries will vary from query to query.
This depends a bit on the memory limit and version of Impala that you're running. If a query gets close to it's memory limit, two things can happen - it can slow down because of spilling or reduced # threads, or it can fail. If you have mem_limits set to reasonable values (e.g. 2GB+), that makes query failures much less likely because spilling will be reliable. In more recent versions of Impala, we've been reducing the changes of query failures in these cases. E.g. CDH5.13 had a lot of improvements for HASH JOIN, AGGREGATE and SORT.
Created 09-30-2018 10:24 PM
i have an follow-up question on this , can an given query exceed the memory consumption specified as Default minimum memory (MEM_LIMIT)
Let's suppose , Default memory limit is set to 20g and if a given query require more than 20g can impalad process allocate the additional memory without cancelling it .
so far, we have noticed , Default memory limit is setting an hard limit on the memory utilization thus canceling the query and each time we had to set the mem_limit to higher value and re-run although OS has sufficient amount of memory to allocate.
Version : CDH5.8.2
Created 10-10-2018 01:30 PM
The MEM_LIMIT is a hard limit on the amount of memory that can be used by the query and cannot be re-negotiated during execution. If the default mem_limit that you set does not suffice, you can either increase it OR you can set the mem_limit query option to a higher value only for that query.