Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala Memory limit exceeded despite setting memory limit

avatar
Explorer

Despite executing a query with a memory limit set(say 5GB) and the impala deamon having sufficient memory, the query gets cancelled owing the error 'memory exceeded'.

 

High level error as below : 

 

queryMemory limit exceeded: Error occurred on backend ***** by fragment e4a777f7045b132:53b698be0000004c Memory left in process limit: 59.51 GB Memory left in query limit: -218.77 KB Query(e4a777f7045b132:53b698be00000000): memory limit exceeded. Limit=5.00 GB Reservation=3.93 GB ReservationLimit=4.00 GB OtherMemory=1.07 GB Total=5.00 GB Peak=5.00 GB Unclaimed reservations: Reservation=241.94 MB OtherMemory=0 Total=241.94 MB Peak=608.38 MB Fragment e4a777f7045b132:53b698be0000007f: Reservation=0 OtherMemory=220.02 KB Total=220.02 KB Peak=1.14 MB.

 

Note : the query was executed with mem_limit  of 5GB. I believe that even if it exceeds the query limits, it should rather spill data to disk than cancel the query. 

 

Can someone reason this error. Despite process memory good enough for execution, the query fails.

 

7 REPLIES 7

avatar
Super Guru
Not all operations will spill to disk for Impala query, please refer to below doc:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_scalability.html#spill_to_dis...

Can you share the full PROFILE to understand which part of operation got OOM?

avatar
Explorer

I have attached the profile herewith.

 

https://docs.google.com/document/d/1EGcllvV0okfXUluAtvXq4b5Bo9Z9deMG87HP2BKbPh0/edit?usp=sharing

 

 

Please let us know what you find from it.

avatar

@Tomas79Impala should definitely be able to stay under a query mem_limit by spilling to disk! I've put a lot of work into making that work more reliably over the last couple of years. I don't know what version you were experimenting with but it's gotten much better over time. We had incremental improvements in most releases from CDH5.5 -> CDH5.12, then a big improvement in CDH5.13 (there was a big rework of the spill-to-disk code), then I expect we'll see another big increase in reliability in CDH6.1 based on the work we've already done.

 

My rules of thumb for what to expect are:

 

  • CDH5.9+: spilling is fairly reliable if the query has plenty of memory (300MB+ for each spilling agg and join, adequate memory for other query operators).
  • CDH5.13+: spilling of aggregation, hash join and sort (i.e. the most memory-hungry operators) are more reliable because those operators reserve memory for spilling and have lower minimum memory requirements. Other operators like SCAN need to have adequate memory to run in-memory. Memory limit exceeded should be rare if each query has a mem_limit of multiple GBs, e.g. 2GB+.
  • CDH6.1+: it should be much harder to hit a "memory limit exceeded" after a query starts running - we've fixed a lot of edge cases, particularly in the SCAN operators.

 

The most common reason I see for queries failing to spill is that there are concurrent queries running without mem_limits set or the mem_limits add up to more than the process mem_limit. This has gotten better (and 6.1 should have further improvements) but we do rely on memory-based admission control being configured with mem_limits set to completely avoiding OOM scenarios from multiple queries competing for the same memory.

avatar

Hi, thanks for the clarification. I am working on CDH 5.13+. Maybe I was not exact, but my experience is that Impala very often cant run the query because it requires more memory than it can get.

It is true that query in running state fails rarely on "memory limit" if the pools are set to a non-overlapping mode (i.e. total memory of Impala daemons divided for example by 3 pools - such as M, L, XL, and in each pool you allow to run only that much queries how much do you have in the pool - such as XL has a 9GB limit, the pool size is 27GB, you allow 3 queries at a time to run).

But my complain was that in general, that many times the query (with up-to-date stats on tables) simply does not start to run in the corresponding pool (for example pool L with 2GB limit) because the planner thinks it will need more memory. And this is what I was referring to, that you as and end user can't trade this for a disk spill, cant say, hey I know that maybe the sum of the operators are above 2GB in your plan, but please run it in this pool with 2G limit, and if you will hit the wall, start to spill (i.e. trade off for performance). So my experience is, that the user has to "upgrade" the query to pool XL, and run it there. But then it is like in a "highway" - everybody is driving on the left lane (or right - for UK readers). And the funny thing is that after the query finishes in XL pool, you examine the memory allocations and it would fit into the 2G limit. 

 

And @Tim Armstrong thank you very much for all the efforts, I have a quite experience and I know the history of Impala, so thumbs up for all the improvements done in the last versions - looking forward to CDH6!

avatar

This is also my experience. There is no way to escape from this large memory hungry queries by hoping that you trade it off for a disk spill and you deliver the results (slower than expected but still deliver)

 

The Impala just works in memory, you have to accept it. And the memory limit is not for the spilling, but for exactly this reason, to kill the query, to avoid consuming all the resources.

 

So if you were hoping that there is a setting, which can tell the Impala to not consume more memory but still finish the query, I will dissapoint you, I did not find anything like that.

 

avatar
New Contributor

Hi there,

 

We are having the same issue.

We are trying to compute incremental stats for a table, even specifing the partitions.

The data volume is 900Mb, in 4 files, around 2.000 columns and saved as parquet.

 

And we got:

Status: Memory limit exceeded: 
The memory limit is set too low to initialize spilling operator (id=1).
The minimum required memory to spill this operator is 0.
Error occurred on backend ----- by fragment 494c7492338f9bc1:33554c2400000001
Memory left in process limit: 139.01 GB
Query(494c7492338f9bc1:33554c2400000000): Total=2.14 KB Peak=2.14 KB Fragment 494c7492338f9bc1:33554c2400000000: Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=4): Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=0 Peak=0 Block Manager: Limit=111.32GB Total=0 Peak=0 Fragment 494c7492338f9bc1:33554c2400000004: Total=546.12 KB Peak=546.12 KB AGGREGATION_NODE (id=3): Total=546.12 KB Peak=546.12 KB Exprs: Total=508.00 KB Peak=508.00 KB EXCHANGE_NODE (id=2): Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 Fragment 494c7492338f9bc1:33554c2400000001: Total=546.12 KB Peak=546.12 KB AGGREGATION_NODE (id=1): Total=546.12 KB Peak=546.12 KB Exprs: Total=508.00 KB Peak=508.00 KB HDFS_SCAN_NODE (id=0): Total=0 Peak=0

  

This propertie is set as follow:

-inc_stats_size_limit_bytes=275090541379584000

  

Can not believe that with such small amount of data, Impala is not be able to make the maths.
I do not know if it has something to see with this formula (but still, it has more than enough to go through):

400 bytes of metadata * column * partition

 

Any help here will be great.

Thanks in advance! 🙂

avatar
@BorjaRodriguez that looks like a different known issue that we've seen
with incremental stats on tables with large numbers of columns. It's fixed
in CDH5.13+.