Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error Impala admission control

avatar
Explorer

We have configured Impala admission control with a memory limit of 460 GB for a specific pool. However, we have noticed that a specific query was using memory way more than this.

 

++++++++++

SELECT 

 

  • User: xxxx

 

  • Database: default
  • Query Type: QUERY

 

  • Coordinator: 
  • Duration: 75.8m

 

  • Query Status: Memory limit exceeded
  • Admission Result: Admitted immediately

 

  • Admission Wait Time: 0ms
  • Aggregate Peak Memory Usage: 879.5 GiB

 

  • Estimated per Node Peak Memory: 2.2 GiB
  • HDFS Bytes Read: 105.7 MiB

 

  • Memory Accrual: 367 GiB hours
  • Memory Spilled: 38.1 GiB

 

  • Node with Peak Memory Usage: xxxx
  • Out of Memory: true

 

  • Per Node Peak Memory Usage: 102.6 GiB
  • Pool: root.impalaxxxxpool

 

  • Query State: EXCEPTION
  • Threads: CPU Time: 117.46s

 

 

++++++

Ideally, it should fail once the aggregate memory for that pool croses 460 but here it seems like it failed once the total cluster memory got exhausted. For your advice. 

8 REPLIES 8

avatar

Did you set a default query memory limit for the pool? If you didn't, then there's no enforcement of memory consumption.

avatar
Explorer

No. I have only set "Max Memory". 

 

In the cloudera doc it says,

 

++++++

Note: If you specify Max Memory for an Impala dynamic resource pool, you must also specify the Default Query Memory LimitMax Memory relies on the Default Query Memory Limit to produce a reliable estimate of overall memory consumption for a query.

+++++

 

Is this what you meant?

avatar

Yes, exactly. If you want enforcement of memory consumption then that field needs to have a non-zero value.

 

We're aware this could be easier and more intuitive. We're currently working on some improvements in this area.

avatar
Explorer

Thanks, Tim for your reply.

 

However, if set, Impala requests this amount of memory from each node, and the query does not proceed until that much memory is available. This can cause query failures since memory required for queries will vary from query to query.

 

A query can fail if specified memory is not available in the nodes even if it requires less memory than that.

avatar
Explorer

Also, if I am using mem_lim query option while running query, will it bypass the "max mem" set at admission control setting? For example, I have set 400GB in max_mem and using mem_lim as 450 while running the query.  Another issue, if mem_lim is set at pool level, the number of queries that can be executed will be reduced right? Since mem_lim amount of RAM will be reserved for each query. 

avatar

Also, if I am using mem_lim query option while running query, will it bypass the "max mem" set at admission control setting? For example, I have set 400GB in max_mem and using mem_lim as 450 while running the query.

 

The memory limit does not override the pool "Max Memory" - admission control won't admit queries if the total of their memory limit across all hosts exceeds the pool max memory. mem_limit is a per-host number while "Max Memory" is a cluster-wide number.

 

> Another issue, if mem_lim is set at pool level, the number of queries that can be executed will be reduced right? Since mem_lim amount of RAM will be reserved for each query. 

Yeah, there's a trade-off between admitting more queries and reliably giving each query enough memory to run fast. One thing to keep in mind is that running more queries concurrently doesn't mean higher through - if you are running enough queries to max out CPU or disk, then admitting more concurrently won't improve throughput.

 

However, if set, Impala requests this amount of memory from each node, and the query does not proceed until that much memory is available. This can cause query failures since memory required for queries will vary from query to query.

This depends a bit on the memory limit and version of Impala that you're running. If a query gets close to it's memory limit, two things can happen - it can slow down because of spilling or reduced # threads, or it can fail.  If you have mem_limits set to reasonable values (e.g. 2GB+), that makes query failures much less likely because spilling will be reliable. In more recent versions of Impala, we've been reducing the changes of query failures in these cases. E.g. CDH5.13 had a lot of improvements for HASH JOIN, AGGREGATE and SORT.

avatar
Contributor

@Tim Armstrong,

 

i have an follow-up question on this , can an given query exceed the memory consumption specified as Default minimum memory (MEM_LIMIT)

 

 Let's suppose , Default memory limit is set to 20g and if a given query  require more than 20g can impalad process allocate the additional memory without cancelling it .

 

so far, we have noticed , Default memory limit is setting an hard limit on the memory utilization thus canceling the query and each time we had to set the mem_limit to higher value and re-run although OS has sufficient amount of memory to allocate.

 

 

Version :  CDH5.8.2

 

 

 

 

 

 

 

 

 

 

 

avatar
Cloudera Employee

The MEM_LIMIT is a hard limit on the amount of memory that can be used by the query and cannot be re-negotiated during execution. If the default mem_limit that you set does not suffice, you can either increase it OR you can set the mem_limit query option to a higher value only for that query.