Created on 08-31-2018 04:40 AM - edited 09-16-2022 06:39 AM
Despite executing a query with a memory limit set(say 5GB) and the impala deamon having sufficient memory, the query gets cancelled owing the error 'memory exceeded'.
High level error as below :
queryMemory limit exceeded: Error occurred on backend ***** by fragment e4a777f7045b132:53b698be0000004c Memory left in process limit: 59.51 GB Memory left in query limit: -218.77 KB Query(e4a777f7045b132:53b698be00000000): memory limit exceeded. Limit=5.00 GB Reservation=3.93 GB ReservationLimit=4.00 GB OtherMemory=1.07 GB Total=5.00 GB Peak=5.00 GB Unclaimed reservations: Reservation=241.94 MB OtherMemory=0 Total=241.94 MB Peak=608.38 MB Fragment e4a777f7045b132:53b698be0000007f: Reservation=0 OtherMemory=220.02 KB Total=220.02 KB Peak=1.14 MB.
Note : the query was executed with mem_limit of 5GB. I believe that even if it exceeds the query limits, it should rather spill data to disk than cancel the query.
Can someone reason this error. Despite process memory good enough for execution, the query fails.
Created 08-31-2018 07:40 PM
Created 09-06-2018 03:21 AM
I have attached the profile herewith.
https://docs.google.com/document/d/1EGcllvV0okfXUluAtvXq4b5Bo9Z9deMG87HP2BKbPh0/edit?usp=sharing
Please let us know what you find from it.
Created on 09-06-2018 11:24 AM - edited 09-06-2018 11:26 AM
@Tomas79Impala should definitely be able to stay under a query mem_limit by spilling to disk! I've put a lot of work into making that work more reliably over the last couple of years. I don't know what version you were experimenting with but it's gotten much better over time. We had incremental improvements in most releases from CDH5.5 -> CDH5.12, then a big improvement in CDH5.13 (there was a big rework of the spill-to-disk code), then I expect we'll see another big increase in reliability in CDH6.1 based on the work we've already done.
My rules of thumb for what to expect are:
The most common reason I see for queries failing to spill is that there are concurrent queries running without mem_limits set or the mem_limits add up to more than the process mem_limit. This has gotten better (and 6.1 should have further improvements) but we do rely on memory-based admission control being configured with mem_limits set to completely avoiding OOM scenarios from multiple queries competing for the same memory.
Created 09-07-2018 12:14 AM
Hi, thanks for the clarification. I am working on CDH 5.13+. Maybe I was not exact, but my experience is that Impala very often cant run the query because it requires more memory than it can get.
It is true that query in running state fails rarely on "memory limit" if the pools are set to a non-overlapping mode (i.e. total memory of Impala daemons divided for example by 3 pools - such as M, L, XL, and in each pool you allow to run only that much queries how much do you have in the pool - such as XL has a 9GB limit, the pool size is 27GB, you allow 3 queries at a time to run).
But my complain was that in general, that many times the query (with up-to-date stats on tables) simply does not start to run in the corresponding pool (for example pool L with 2GB limit) because the planner thinks it will need more memory. And this is what I was referring to, that you as and end user can't trade this for a disk spill, cant say, hey I know that maybe the sum of the operators are above 2GB in your plan, but please run it in this pool with 2G limit, and if you will hit the wall, start to spill (i.e. trade off for performance). So my experience is, that the user has to "upgrade" the query to pool XL, and run it there. But then it is like in a "highway" - everybody is driving on the left lane (or right - for UK readers). And the funny thing is that after the query finishes in XL pool, you examine the memory allocations and it would fit into the 2G limit.
And @Tim Armstrong thank you very much for all the efforts, I have a quite experience and I know the history of Impala, so thumbs up for all the improvements done in the last versions - looking forward to CDH6!
Created 09-03-2018 01:00 AM
This is also my experience. There is no way to escape from this large memory hungry queries by hoping that you trade it off for a disk spill and you deliver the results (slower than expected but still deliver)
The Impala just works in memory, you have to accept it. And the memory limit is not for the spilling, but for exactly this reason, to kill the query, to avoid consuming all the resources.
So if you were hoping that there is a setting, which can tell the Impala to not consume more memory but still finish the query, I will dissapoint you, I did not find anything like that.
Created on 09-07-2018 05:47 AM - edited 09-07-2018 05:48 AM
Hi there,
We are having the same issue.
We are trying to compute incremental stats for a table, even specifing the partitions.
The data volume is 900Mb, in 4 files, around 2.000 columns and saved as parquet.
And we got:
Status: Memory limit exceeded:
The memory limit is set too low to initialize spilling operator (id=1).
The minimum required memory to spill this operator is 0.
Error occurred on backend ----- by fragment 494c7492338f9bc1:33554c2400000001
Memory left in process limit: 139.01 GB
Query(494c7492338f9bc1:33554c2400000000): Total=2.14 KB Peak=2.14 KB Fragment 494c7492338f9bc1:33554c2400000000: Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=4): Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=0 Peak=0 Block Manager: Limit=111.32GB Total=0 Peak=0 Fragment 494c7492338f9bc1:33554c2400000004: Total=546.12 KB Peak=546.12 KB AGGREGATION_NODE (id=3): Total=546.12 KB Peak=546.12 KB Exprs: Total=508.00 KB Peak=508.00 KB EXCHANGE_NODE (id=2): Total=0 Peak=0 DataStreamRecvr: Total=0 Peak=0 Fragment 494c7492338f9bc1:33554c2400000001: Total=546.12 KB Peak=546.12 KB AGGREGATION_NODE (id=1): Total=546.12 KB Peak=546.12 KB Exprs: Total=508.00 KB Peak=508.00 KB HDFS_SCAN_NODE (id=0): Total=0 Peak=0
This propertie is set as follow:
-inc_stats_size_limit_bytes=275090541379584000
Can not believe that with such small amount of data, Impala is not be able to make the maths.
I do not know if it has something to see with this formula (but still, it has more than enough to go through):
400 bytes of metadata * column * partition
Any help here will be great.
Thanks in advance! 🙂
Created 09-10-2018 02:30 PM