Created on 02-14-2017 01:24 AM - edited 09-16-2022 04:05 AM
Followed the Cloudera guide to set up and configure Impala Admission control via Dynamic Resource Pools in CDH 5.7 (Impala 2.5.0), one of the first thing is that "admin should know what kind of workflow is executed and what are the resource requests".
I started with this, to capping the queries wiht try and error approach. I set up a limit for a particular query to 2GB via set MEM_LIMIT option and it worked fine.
Later the query failed on memory limit exceeded error, however the explain clearly shows, that there are NO peak with more than few MBs. Everything is under 1GB (except the scan which has 1.08Gig)
I had to raise the limit from 2GB to 4GB. Now the query runs ok.
What concerns me is the fact, that relying on explain summary is not enough and I dont know where to look for a safe threshold.
Exec Summary Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------------------- 37:EXCHANGE 5 74.246ms 367.995ms 810.69K 3.62M 0 0 HASH(CAST(substring(field... 23:SELECT 5 15.465ms 23.090ms 810.69K 3.62M 3.01 MB 0 22:ANALYTIC 5 191.768ms 263.507ms 811.66K 3.62M 23.01 MB 0 21:SORT 5 1s492ms 1s802ms 811.66K 3.62M 112.64 MB 240.00 MB 36:EXCHANGE 5 66.684ms 74.809ms 811.66K 3.62M 0 0 HASH(d.eid) 20:HASH JOIN 5 233.074ms 404.226ms 811.66K 3.62M 2.10 MB 331.00 B LEFT OUTER JOIN, BROADCAST |--35:EXCHANGE 5 12.292us 14.792us 71 7 0 0 BROADCAST | 11:SCAN HDFS 1 6.251ms 6.251ms 71 7 84.00 KB 64.00 MB trg.table1 tp 19:HASH JOIN 5 45.483ms 57.714ms 811.66K 3.62M 2.10 MB 103.00 B LEFT OUTER JOIN, BROADCAST |--34:EXCHANGE 5 15.827us 17.612us 441 2 0 0 BROADCAST | 10:SCAN HDFS 1 23.226ms 23.226ms 441 2 115.00 KB 80.00 MB trg.table2 o 18:HASH JOIN 5 81.817ms 103.238ms 811.66K 3.62M 2.27 MB 2.53 KB LEFT OUTER JOIN, BROADCAST |--33:EXCHANGE 5 10.905us 16.528us 116 78 0 0 BROADCAST | 09:SCAN HDFS 1 21.119ms 21.119ms 116 78 77.00 KB 48.00 MB trg.table3... 17:HASH JOIN 5 238.710ms 310.216ms 810.69K 3.62M 3.09 MB 34.79 KB LEFT OUTER JOIN, BROADCAST |--32:EXCHANGE 5 14.653us 17.272us 506 506 0 0 BROADCAST | 08:SCAN HDFS 1 5.079ms 5.079ms 506 506 121.17 KB 48.00 MB trg.table4 s1 16:HASH JOIN 5 37.020ms 48.938ms 810.69K 3.62M 2.06 MB 154.41 KB LEFT OUTER JOIN, BROADCAST |--31:EXCHANGE 5 7.393us 12.096us 4 3.78K 0 0 BROADCAST | 07:SCAN HDFS 1 6.101ms 6.101ms 4 3.78K 958.15 KB 64.00 MB tmp.table8 s2 15:HASH JOIN 5 21.639ms 29.371ms 810.69K 3.62M 2.05 MB 154.41 KB LEFT OUTER JOIN, BROADCAST |--30:EXCHANGE 5 8.122us 11.664us 1 3.78K 0 0 BROADCAST | 06:SCAN HDFS 1 8.067ms 8.067ms 1 3.78K 958.15 KB 64.00 MB tmp.table8 s3 14:HASH JOIN 5 82.441ms 99.157ms 810.69K 3.62M 2.53 MB 154.41 KB LEFT OUTER JOIN, BROADCAST |--29:EXCHANGE 5 220.902us 235.376us 11.35K 3.78K 0 0 BROADCAST | 05:SCAN HDFS 1 5.741ms 5.741ms 11.35K 3.78K 1.09 MB 64.00 MB tmp.table8 s4 13:HASH JOIN 5 254.708ms 304.909ms 810.69K 3.62M 152.05 MB 8.64 MB RIGHT OUTER JOIN, PARTITIONED |--28:EXCHANGE 5 40.477ms 58.477ms 810.69K 82.11K 0 0 HASH(CAST(d.entid AS INT)) | 12:HASH JOIN 7 226.257ms 359.085ms 810.69K 82.11K 2.09 MB 184.00 B LEFT OUTER JOIN, BROADCAST | |--26:EXCHANGE 7 9.622us 11.480us 9 9 0 0 BROADCAST | | 03:SCAN HDFS 1 4.366ms 4.366ms 9 9 47.00 KB 32.00 MB trg.table5 ro | 02:HASH JOIN 7 23.812ms 35.389ms 810.69K 82.11K 2.20 MB 2.16 MB LEFT OUTER JOIN, PARTITIONED | |--25:EXCHANGE 7 91.400us 131.127us 10.44K 82.11K 0 0 HASH(d_sur.reference,d_sur.... | | 01:SCAN HDFS 7 410.599ms 520.945ms 10.44K 82.11K 13.21 MB 168.00 MB tmp.table8_detail d_sur | 24:EXCHANGE 7 14.274ms 20.817ms 810.69K 82.11K 0 0 HASH(d.reference,d.mnid,d... | 00:SCAN HDFS 7 495.925ms 697.044ms 810.69K 82.11K 52.63 MB 1.08 GB tmp.table8_detail d 27:EXCHANGE 5 26.006ms 34.625ms 3.62M 3.62M 0 0 HASH(c.intid) 04:SCAN HDFS 5 239.460ms 369.856ms 3.62M 3.62M 35.71 MB 48.00 MB tmp.table7 c
Based on this explain summary some would say 1.08GB is a peak, so 1.5GB limit would be safe. But not even close.
Any thoughts where to look for a real consumption?
Thanks
Created 02-23-2017 06:47 PM
Thomas, you have a legitimate request and concern.
First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though.
1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time).
2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption.
Hope this helps!
Created 02-14-2017 01:41 AM
And even more confusing is this:
Limit is way bigger than the consumption, but still failing..
Memory Limit Exceeded Query(a04dc11b92580986:99f26ad466311b9f) Limit: Limit=4.00 GB Consumption=1.48 GB Fragment a04dc11b92580986:99f26ad466311baa: Consumption=48.03 MB SELECT_NODE (id=29): Consumption=0 ANALYTIC_EVAL_NODE (id=28): Consumption=0 SORT_NODE (id=27): Consumption=48.02 MB EXCHANGE_NODE (id=44): Consumption=0 DataStreamRecvr: Consumption=0 Block Manager: Limit=3.20 GB Consumption=528.00 MB Fragment a04dc11b92580986:99f26ad466311bb0: Consumption=988.46 MB HASH_JOIN_NODE (id=26): Consumption=0 HASH_JOIN_NODE (id=25): Consumption=0 HASH_JOIN_NODE (id=24): Consumption=0 NESTED_LOOP_JOIN_NODE (id=23): Consumption=494.08 MB Exprs: Consumption=494.00 MB HASH_JOIN_NODE (id=22): Consumption=72.00 KB HASH_JOIN_NODE (id=21): Consumption=64.00 KB NESTED_LOOP_JOIN_NODE (id=20): Consumption=494.06 MB Exprs: Consumption=494.00 MB HASH_JOIN_NODE (id=19): Consumption=76.00 KB Exprs: Consumption=28.00 KB HASH_JOIN_NODE (id=18): Consumption=40.00 KB HASH_JOIN_NODE (id=17): Consumption=32.00 KB NESTED_LOOP_JOIN_NODE (id=16): Consumption=32.00 KB HASH_JOIN_NODE (id=15): Consumption=16.00 KB EXCHANGE_NODE (id=31): Consumption=0 DataStreamRecvr: Consumption=0 EXCHANGE_NODE (id=32): Consumption=0 DataStreamRecvr: Consumption=0 EXCHANGE_NODE (id=33): Consumption=0
Created 02-14-2017 05:05 PM
Can you link the guide you are following. Admission Control and dynamic resource pools are seperate but can function together. Unless I am mistaken, MEM_LIMIT is the memory limit for each Impala Daemon. Ok, just did a quick read it can be used at run time per query. It will set the limit for a specific query. Is this were you are setting it? This has nothing to do with Admission Control or DRP.
For DRP you can set default_pool_mem_limit to cap how much memory can be used in a pool used by specific users/groups/queries.
Created 02-15-2017 05:40 AM
Yes setting the memory limit per query. My point was, that even when I set a much higher limit via SET MEM_LIMIT the query fails.
Example:
I set 3GB. The query fails and in the error message there is only 1GB consumption. I assume that the metrics about the memory consumption are not accurate, so ok, I assume that the query eated more than 3 gigs.
But then I raise the limit to 5GB and the query succeeds. So I check the real execution plan, and there is no evidence of using more than 3GB, lets say there is a peak memory usage 1.5GB. I dont have real exec plans or profiles, but beleive me, the output is not correct.
Regarding the admission control, this is the link:
https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/
"Cloudera recommends that you set memory limits for queries whenever possible. Default memory limits can be set on all queries in a pool (see below), and explicitly specifying the query memory limit will override the default. If you are using admission control pools with restrictions on the maximum memory available to queries in a pool, setting default per-query memory limits is required."
https://blog.cloudera.com/blog/2016/12/resource-management-for-apache-impala-incubating/
Another cloudera documentation states clearly, that if DO NOT SET the mem limit, then the default query limit is "allocated". So If I have 100GB of total memory, 5 nodes, and I set 10GB as a default, then if two users run a query in the same time (even tiny ones) then the third query will go to the queue and will wait.
"
A dynamic resource pool has Max Memory set to 100 GB.
The Default Query Memory Limit for the pool is 10 GB. Therefore, any query running in this pool could use up to 50 GB of memory (default query memory limit * number of Impala nodes).
The maximum number of queries that Impala executes concurrently within this dynamic resource pool is two, which is the most that could be accomodated within the 100 GB Max Memory cluster-wide limit."
So what I would like to do is to measure EXACTLY the peak memory and SET it to the value that is safe enough to go through.
The problem is, that explain plans, and error messages are not correct, or I am not understanding the whole memory limit concept at all.
T.
Created 02-23-2017 06:47 PM
Thomas, you have a legitimate request and concern.
First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though.
1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time).
2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption.
Hope this helps!