Reply
Highlighted
New Contributor
Posts: 6
Registered: ‎11-15-2018
Accepted Solution

Protecting queries in Impala using pools and mem limit

Hello.

 

We currently have a number of classes of users leveraging Impala.  We're running into issues where some users create queries that exhaust availabe Impala memory and then impact other queries, potentially taking them down as we encounter OOM errors.

 

Our goal is to leverage a default memory limit per query (potentially across pools) to prevent rogue queries from running unchecked.  However, when we set this value to some fraction of total Impala memory per node (say, 128GB) we end up with Impala attempting to RESERVE that amount of memory for each query, which is not the effect we're looking for.

 

Since we're using resource management (not Llama, but Admission Control), I believe the following is relevant: 

 

When resource management is enabled, the mechanism for this option changes. If set, it overrides the automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the query does not proceed until that much memory is available. The actual memory used by the query could be lower, since some queries use much less memory than others. With resource management, the MEM_LIMIT setting acts both as a hard limit on the amount of memory a query can use on any node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query is being executed.

 

I guess my question is, can we use resource management for Impala AND have mem_limit actual be a simple limit-per-query and NOT a reservation-per-query?  Do we absolutely need to turn of resource management for Impala if we want mem_limit to behave as a limit and not a reservation?  If so, what exactly do we need to turn off?   Since Llama is no longer relevant in the Impala universe, the actual setting I'm supposed to toggle is a little obscure.  Is it actually Admission Control we need to turn off, or something else?

 

Thanks!

Mike

New Contributor
Posts: 6
Registered: ‎11-15-2018

Re: Protecting queries in Impala using pools and mem limit

Actually, in re-reading what I wrote, I'm further confused...

 

When resource management is enabled, the mechanism for this option changes. If set, it overrides the automatic memory estimate from Impala. Impala requests this amount of memory from YARN

 

We are not using Impala on YARN (ie, Llama).  I don't think it's physical possible to use this feature anymore (we're running CDH 5.11).  So how can setting the default Impala mem_limit be causing a reservation of memory versus a simple over-the-limit check during execution? 

 

Thanks again,

M

 

Cloudera Employee
Posts: 368
Registered: ‎07-29-2015

Re: Protecting queries in Impala using pools and mem limit

Yes the YARN reference is a documentation error that we've fixed since then - sorry about that.
Cloudera Employee
Posts: 368
Registered: ‎07-29-2015

Re: Protecting queries in Impala using pools and mem limit

If you set "Default Pool Memory Limit" and *don't* set the "Maximum Memory" for a resource pool you'll get the effect you're looking for - the memory consumption per query is limited but it's not reserved. So that does help with the "single runaway query" scenario you're talking about. The downside of that configuration is that you don't really have a guarantee that any query gets the memory its entitled to - you could exhaust memory with multiple concurrent large queries. I could go on about this stuff a lot more. We have a bunch of improvements in the pipeline - the 6.1 release has some work slated that gives you more flexibility in how much memory queries of different sizes get instead of the one-size-fits all memory limit.
New Contributor
Posts: 6
Registered: ‎11-15-2018

Re: Protecting queries in Impala using pools and mem limit

Thanks Tim!. Leaving max memory set while I was testing this was my fatal flaw.

Definitely agree on the multiple concurrent rogue queries creating problems. We're actually working on a bot to periodically poll and purge queries that appear to be going off the rails. Looking forward to upgrading to 6.x for additional bulwarks inside CM.

Again, thanks!!
Announcements