Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

impala memory limit exceed

avatar
Explorer

Hi, 

 

When we execute a query with groupBy, Having, etc. clauses, Impala shows this error: 

 

Memory limit exceeded The memory limit is set too low to initialize spilling operator (id=2). The minimum required memory to spill this operator is 272.00 MB.

 

How we can set the minimum required memory?

How we can solve it?

 

Thanks

13 REPLIES 13

avatar

Hi Krishnat,

  It depends on the complexity of the query - that number is per-plan-node, not global. It may need more memory if there are a lot of operators in the plan. Hard to know without seeing the plan or execution summary (or both).

 

Historically there were various bugs that might result in this happening in certain cases but I believe all the fixes  landed in 5.10.

 

I agree that the message and behaviour could be a lot more helpful/actionable - improving spill-to-disk is actually my primary focus right now - I'm very excited about the changes we have in the pipeline.

avatar
Rising Star
Estimated Per-Host Requirements: Memory=628.99MB VCores=3

PLAN-ROOT SINK
|
66:EXCHANGE [UNPARTITIONED]
| hosts=10 per-host-mem=unavailable
| tuple-ids=20,42N row-size=210B cardinality=26977
|
32:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]
| hash predicates: campaign = campaign, carrier = carrier, market = market, sessiontype = sessiontype
| hosts=10 per-host-mem=292.69KB
| tuple-ids=20,42N row-size=210B cardinality=26977
|
|--65:EXCHANGE [HASH(campaign,market,carrier,sessiontype)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=42 row-size=101B cardinality=26977
| |
| 64:AGGREGATE [FINALIZE]
| | output: sum:merge(CASE WHEN carrier_count = 2 THEN samples ELSE 0 END), sum:merge(CAST(samples AS FLOAT)), sum:merge(CASE WHEN carrier_count = 3 THEN samples ELSE 0 END), sum:merge(CASE WHEN carrier_count > 1 THEN samples ELSE 0 END), sum:merge(CASE WHEN carrier_count > 1 THEN sum_total_bandwidth ELSE 0 END)
| | group by: campaign, market, carrier, sessiontype
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=42 row-size=101B cardinality=26977
| |
| 63:EXCHANGE [HASH(campaign,market,carrier,sessiontype)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=42 row-size=101B cardinality=26977
| |
| 31:AGGREGATE [STREAMING]
| | output: sum(CASE WHEN carrier_count = 2 THEN samples ELSE 0 END), sum(CAST(samples AS FLOAT)), sum(CASE WHEN carrier_count = 3 THEN samples ELSE 0 END), sum(CASE WHEN carrier_count > 1 THEN samples ELSE 0 END), sum(CASE WHEN carrier_count > 1 THEN sum_total_bandwidth ELSE 0 END)
| | group by: campaign, market, carrier, sessiontype
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=42 row-size=101B cardinality=26977
| |
| 16:UNION
| | hosts=10 per-host-mem=0B
| | tuple-ids=40 row-size=78B cardinality=26977
| |
| |--62:AGGREGATE [FINALIZE]
| | | output: sum:merge(ltebwcum), count:merge(*)
| | | group by: campaign, market, carrier, sessiontype, carrier_count, tech_mode
| | | having: tech_mode = 'LTECA'
| | | hosts=10 per-host-mem=10.00MB
| | | tuple-ids=38 row-size=94B cardinality=13400
| | |
| | 61:EXCHANGE [HASH(campaign,market,carrier,sessiontype,carrier_count,tech_mode)]
| | | hosts=10 per-host-mem=0B
| | | tuple-ids=38 row-size=94B cardinality=13400
| | |
| | 30:AGGREGATE [STREAMING]
| | | output: sum(ltebwcum), count(*)
| | | group by: a.campaign, a.market, a.carrier, CASE WHEN SESSIONTYPE = 'HTTPDL_CAPACITY_L' THEN 'HTTPDL_CAPACITY' ELSE SESSIONTYPE END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 4 WHEN l_pdschbytes_scc2 > 0 THEN 3 WHEN l_pdschbytes_scc1 > 0 THEN 2 WHEN L_pdschbytes > 0 THEN 1 ELSE 0 END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END
| | | hosts=10 per-host-mem=10.00MB
| | | tuple-ids=38 row-size=94B cardinality=13400
| | |
| | 29:HASH JOIN [INNER JOIN, PARTITIONED]
| | | hash predicates: a.campaign = campaign, a.carrier = carrier, a.market = a.market, a.filename = filename
| | | other predicates: unix_timestamp(udfs.totimestamp(a.time_stamp)) <= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeend))), unix_timestamp(udfs.totimestamp(a.time_stamp)) >= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeinit)))
| | | runtime filters: RF015 <- campaign, RF017 <- a.market, RF016 <- carrier, RF018 <- filename
| | | hosts=10 per-host-mem=1018.71KB
| | | tuple-ids=31,32,34 row-size=468B cardinality=13400
| | |
| | |--60:EXCHANGE [HASH(campaign,carrier,a.market,filename)]
| | | | hosts=10 per-host-mem=0B
| | | | tuple-ids=32,34 row-size=281B cardinality=33804
| | | |
| | | 28:HASH JOIN [INNER JOIN, BROADCAST]
| | | | hash predicates: a.market = market
| | | | hosts=10 per-host-mem=17.62KB
| | | | tuple-ids=32,34 row-size=281B cardinality=33804
| | | |
| | | |--58:EXCHANGE [BROADCAST]
| | | | | hosts=5 per-host-mem=0B
| | | | | tuple-ids=34 row-size=29B cardinality=569
| | | | |
| | | | 57:AGGREGATE [FINALIZE]
| | | | | group by: market
| | | | | hosts=5 per-host-mem=10.00MB
| | | | | tuple-ids=34 row-size=29B cardinality=569
| | | | |
| | | | 56:EXCHANGE [HASH(market)]
| | | | | hosts=5 per-host-mem=0B
| | | | | tuple-ids=34 row-size=29B cardinality=569
| | | | |
| | | | 27:AGGREGATE [STREAMING]
| | | | | group by: market
| | | | | hosts=5 per-host-mem=10.00MB
| | | | | tuple-ids=34 row-size=29B cardinality=569
| | | | |
| | | | 26:SCAN HDFS [mobistat.allstats_packet, RANDOM]
| | | | partitions=1/1 files=6 size=9.32MB
| | | | predicates: bbdo_approved = 1, campaign = '17D1'
| | | | table stats: 51137 rows total
| | | | column stats: all
| | | | hosts=5 per-host-mem=32.00MB
| | | | tuple-ids=33 row-size=53B cardinality=1065
| | | |
| | | 25:SCAN HDFS [mobistat.cdr_packet a, RANDOM]
| | | partitions=4680/10328 files=4681 size=4.70GB
| | | predicates: regexp_like(calldirection, 'HTTPDL_CAPACITY') = TRUE, regexp_like(endresult, 'HTTP SUCCESS') = TRUE, (modpctlte + isnull(modpctlteca, 0)) > 0.999
| | | table stats: 58699689 rows total
| | | column stats: all
| | | hosts=10 per-host-mem=304.00MB
| | | tuple-ids=32 row-size=252B cardinality=2888511
| | |
| | 59:EXCHANGE [HASH(a.campaign,a.carrier,a.market,a.filename)]
| | | hosts=10 per-host-mem=0B
| | | tuple-ids=31 row-size=187B cardinality=23267425
| | |
| | 24:SCAN HDFS [mobistat.psr_packet_cdma a, RANDOM]
| | partitions=2332/3707 files=2332 size=45.71GB
| | predicates: regexp_like(SESSIONTYPE, 'HTTPDL_CAPACITY') = TRUE, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END IS NOT NULL
| | runtime filters: RF015 -> a.campaign, RF017 -> a.market, RF016 -> a.carrier, RF018 -> a.filename
| | table stats: 358488531 rows total
| | column stats: all
| | hosts=10 per-host-mem=608.00MB
| | tuple-ids=31 row-size=187B cardinality=23267425
| |
| 55:AGGREGATE [FINALIZE]
| | output: sum:merge(ltebwcum), count:merge(*)
| | group by: campaign, market, carrier, sessiontype, carrier_count, tech_mode
| | having: tech_mode = 'LTECA'
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=29 row-size=94B cardinality=13577
| |
| 54:EXCHANGE [HASH(campaign,market,carrier,sessiontype,carrier_count,tech_mode)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=29 row-size=94B cardinality=13577
| |
| 23:AGGREGATE [STREAMING]
| | output: sum(ltebwcum), count(*)
| | group by: a.campaign, a.market, a.carrier, CASE WHEN SESSIONTYPE = 'HTTPDL_CAPACITY_L' THEN 'HTTPDL_CAPACITY' ELSE SESSIONTYPE END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 4 WHEN l_pdschbytes_scc2 > 0 THEN 3 WHEN l_pdschbytes_scc1 > 0 THEN 2 WHEN L_pdschbytes > 0 THEN 1 ELSE 0 END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=29 row-size=94B cardinality=13577
| |
| 22:HASH JOIN [INNER JOIN, PARTITIONED]
| | hash predicates: a.campaign = campaign, a.carrier = carrier, a.market = a.market, a.filename = filename
| | other predicates: unix_timestamp(udfs.totimestamp(a.time_stamp)) <= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeend))), unix_timestamp(udfs.totimestamp(a.time_stamp)) >= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeinit)))
| | hosts=10 per-host-mem=1018.71KB
| | tuple-ids=22,23,25 row-size=468B cardinality=13577
| |
| |--53:EXCHANGE [HASH(campaign,carrier,a.market,filename)]
| | | hosts=10 per-host-mem=0B
| | | tuple-ids=23,25 row-size=281B cardinality=33804
| | |
| | 21:HASH JOIN [INNER JOIN, BROADCAST]
| | | hash predicates: a.market = market
| | | hosts=10 per-host-mem=17.62KB
| | | tuple-ids=23,25 row-size=281B cardinality=33804
| | |
| | |--51:EXCHANGE [BROADCAST]
| | | | hosts=5 per-host-mem=0B
| | | | tuple-ids=25 row-size=29B cardinality=569
| | | |
| | | 50:AGGREGATE [FINALIZE]
| | | | group by: market
| | | | hosts=5 per-host-mem=10.00MB
| | | | tuple-ids=25 row-size=29B cardinality=569
| | | |
| | | 49:EXCHANGE [HASH(market)]
| | | | hosts=5 per-host-mem=0B
| | | | tuple-ids=25 row-size=29B cardinality=569
| | | |
| | | 20:AGGREGATE [STREAMING]
| | | | group by: market
| | | | hosts=5 per-host-mem=10.00MB
| | | | tuple-ids=25 row-size=29B cardinality=569
| | | |
| | | 19:SCAN HDFS [mobistat.allstats_packet, RANDOM]
| | | partitions=1/1 files=6 size=9.32MB
| | | predicates: bbdo_approved = 1, campaign = '17D1'
| | | table stats: 51137 rows total
| | | column stats: all
| | | hosts=5 per-host-mem=32.00MB
| | | tuple-ids=24 row-size=53B cardinality=1065
| | |
| | 18:SCAN HDFS [mobistat.cdr_packet a, RANDOM]
| | partitions=4680/10328 files=4681 size=4.70GB
| | predicates: regexp_like(calldirection, 'HTTPDL_CAPACITY') = TRUE, regexp_like(endresult, 'HTTP SUCCESS') = TRUE, (modpctlte + isnull(modpctlteca, 0)) > 0.999
| | table stats: 58699689 rows total
| | column stats: all
| | hosts=10 per-host-mem=304.00MB
| | tuple-ids=23 row-size=252B cardinality=2888511
| |
| 52:EXCHANGE [HASH(a.campaign,a.carrier,a.market,a.filename)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=22 row-size=187B cardinality=23574499
| |
| 17:SCAN HDFS [mobistat.psr_packet_gsm a, RANDOM]
| partitions=2336/6581 files=2336 size=48.59GB
| predicates: regexp_like(SESSIONTYPE, 'HTTPDL_CAPACITY') = TRUE, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END IS NOT NULL
| table stats: 636668107 rows total
| column stats: all
| hosts=10 per-host-mem=608.00MB
| tuple-ids=22 row-size=187B cardinality=23574499
|
48:AGGREGATE [FINALIZE]
| output: sum:merge(carrier_count * samples), sum:merge(samples), sum:merge(sum_256qam), sum:merge(sum_total_frames), sum:merge(sum_total_bandwidth), sum:merge(sum_4tx_samples)
| group by: campaign, market, carrier, sessiontype
| hosts=10 per-host-mem=10.00MB
| tuple-ids=20 row-size=109B cardinality=26977
|
47:EXCHANGE [HASH(campaign,market,carrier,sessiontype)]
| hosts=10 per-host-mem=0B
| tuple-ids=20 row-size=109B cardinality=26977
|
15:AGGREGATE [STREAMING]
| output: sum(carrier_count * samples), sum(samples), sum(sum_256qam), sum(sum_total_frames), sum(sum_total_bandwidth), sum(sum_4tx_samples)
| group by: campaign, market, carrier, sessiontype
| hosts=10 per-host-mem=10.00MB
| tuple-ids=20 row-size=109B cardinality=26977
|
00:UNION
| hosts=10 per-host-mem=0B
| tuple-ids=18 row-size=102B cardinality=26977
|
|--46:AGGREGATE [FINALIZE]
| | output: sum:merge(l_dlnum256qam), sum:merge(CAST(total_frames AS FLOAT)), sum:merge(ltebwcum), sum:merge(CASE WHEN l_dlmaxnumlayer = 4 THEN 1 ELSE 0 END), count:merge(*)
| | group by: campaign, market, carrier, sessiontype, carrier_count, tech_mode
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=16 row-size=118B cardinality=13400
| |
| 45:EXCHANGE [HASH(campaign,market,carrier,sessiontype,carrier_count,tech_mode)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=16 row-size=118B cardinality=13400
| |
| 14:AGGREGATE [STREAMING]
| | output: sum(l_dlnum256qam), sum(CAST((l_dlnum256qam + l_dlnum64qam + l_dlnum16qam + l_dlnumqpsk) AS FLOAT)), sum(ltebwcum), sum(CASE WHEN a.l_dlmaxnumlayer = 4 THEN 1 ELSE 0 END), count(*)
| | group by: a.campaign, a.market, a.carrier, CASE WHEN SESSIONTYPE = 'HTTPDL_CAPACITY_L' THEN 'HTTPDL_CAPACITY' ELSE SESSIONTYPE END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 4 WHEN l_pdschbytes_scc2 > 0 THEN 3 WHEN l_pdschbytes_scc1 > 0 THEN 2 WHEN L_pdschbytes > 0 THEN 1 ELSE 0 END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END
| | hosts=10 per-host-mem=10.00MB
| | tuple-ids=16 row-size=118B cardinality=13400
| |
| 13:HASH JOIN [INNER JOIN, PARTITIONED]
| | hash predicates: a.campaign = campaign, a.carrier = carrier, a.market = a.market, a.filename = filename
| | other predicates: unix_timestamp(udfs.totimestamp(a.time_stamp)) <= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeend))), unix_timestamp(udfs.totimestamp(a.time_stamp)) >= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeinit)))
| | runtime filters: RF005 <- campaign, RF006 <- carrier, RF007 <- a.market, RF008 <- filename
| | hosts=10 per-host-mem=1018.71KB
| | tuple-ids=9,10,12 row-size=488B cardinality=13400
| |
| |--44:EXCHANGE [HASH(campaign,carrier,a.market,filename)]
| | | hosts=10 per-host-mem=0B
| | | tuple-ids=10,12 row-size=281B cardinality=33804
| | |
| | 12:HASH JOIN [INNER JOIN, BROADCAST]
| | | hash predicates: a.market = market
| | | hosts=10 per-host-mem=17.62KB
| | | tuple-ids=10,12 row-size=281B cardinality=33804
| | |
| | |--42:EXCHANGE [BROADCAST]
| | | | hosts=5 per-host-mem=0B
| | | | tuple-ids=12 row-size=29B cardinality=569
| | | |
| | | 41:AGGREGATE [FINALIZE]
| | | | group by: market
| | | | hosts=5 per-host-mem=10.00MB
| | | | tuple-ids=12 row-size=29B cardinality=569
| | | |
| | | 40:EXCHANGE [HASH(market)]
| | | | hosts=5 per-host-mem=0B
| | | | tuple-ids=12 row-size=29B cardinality=569
| | | |
| | | 11:AGGREGATE [STREAMING]
| | | | group by: market
| | | | hosts=5 per-host-mem=10.00MB
| | | | tuple-ids=12 row-size=29B cardinality=569
| | | |
| | | 10:SCAN HDFS [mobistat.allstats_packet, RANDOM]
| | | partitions=1/1 files=6 size=9.32MB
| | | predicates: bbdo_approved = 1, campaign = '17D1'
| | | table stats: 51137 rows total
| | | column stats: all
| | | hosts=5 per-host-mem=32.00MB
| | | tuple-ids=11 row-size=53B cardinality=1065
| | |
| | 09:SCAN HDFS [mobistat.cdr_packet a, RANDOM]
| | partitions=4680/10328 files=4681 size=4.70GB
| | predicates: regexp_like(calldirection, 'HTTPDL_CAPACITY') = TRUE, regexp_like(endresult, 'HTTP SUCCESS') = TRUE, (modpctlte + isnull(modpctlteca, 0)) > 0.999
| | table stats: 58699689 rows total
| | column stats: all
| | hosts=10 per-host-mem=304.00MB
| | tuple-ids=10 row-size=252B cardinality=2888511
| |
| 43:EXCHANGE [HASH(a.campaign,a.carrier,a.market,a.filename)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=9 row-size=207B cardinality=23267425
| |
| 08:SCAN HDFS [mobistat.psr_packet_cdma a, RANDOM]
| partitions=2332/3707 files=2332 size=45.71GB
| predicates: regexp_like(SESSIONTYPE, 'HTTPDL_CAPACITY') = TRUE, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END IS NOT NULL
| runtime filters: RF005 -> a.campaign, RF006 -> a.carrier, RF007 -> a.market, RF008 -> a.filename
| table stats: 358488531 rows total
| column stats: all
| hosts=10 per-host-mem=608.00MB
| tuple-ids=9 row-size=207B cardinality=23267425
|
39:AGGREGATE [FINALIZE]
| output: sum:merge(l_dlnum256qam), sum:merge(CAST(total_frames AS FLOAT)), sum:merge(ltebwcum), sum:merge(CASE WHEN l_dlmaxnumlayer = 4 THEN 1 ELSE 0 END), count:merge(*)
| group by: campaign, market, carrier, sessiontype, carrier_count, tech_mode
| hosts=10 per-host-mem=10.00MB
| tuple-ids=7 row-size=118B cardinality=13577
|
38:EXCHANGE [HASH(campaign,market,carrier,sessiontype,carrier_count,tech_mode)]
| hosts=10 per-host-mem=0B
| tuple-ids=7 row-size=118B cardinality=13577
|
07:AGGREGATE [STREAMING]
| output: sum(l_dlnum256qam), sum(CAST((l_dlnum256qam + l_dlnum64qam + l_dlnum16qam + l_dlnumqpsk) AS FLOAT)), sum(ltebwcum), sum(CASE WHEN a.l_dlmaxnumlayer = 4 THEN 1 ELSE 0 END), count(*)
| group by: a.campaign, a.market, a.carrier, CASE WHEN SESSIONTYPE = 'HTTPDL_CAPACITY_L' THEN 'HTTPDL_CAPACITY' ELSE SESSIONTYPE END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 4 WHEN l_pdschbytes_scc2 > 0 THEN 3 WHEN l_pdschbytes_scc1 > 0 THEN 2 WHEN L_pdschbytes > 0 THEN 1 ELSE 0 END, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END
| hosts=10 per-host-mem=10.00MB
| tuple-ids=7 row-size=118B cardinality=13577
|
06:HASH JOIN [INNER JOIN, PARTITIONED]
| hash predicates: a.campaign = campaign, a.carrier = carrier, a.market = a.market, a.filename = filename
| other predicates: unix_timestamp(udfs.totimestamp(a.time_stamp)) <= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeend))), unix_timestamp(udfs.totimestamp(a.time_stamp)) >= unix_timestamp(udfs.totimestamp(concat(task_date, ' ', timeinit)))
| runtime filters: RF000 <- campaign, RF001 <- carrier
| hosts=10 per-host-mem=1018.71KB
| tuple-ids=0,1,3 row-size=488B cardinality=13577
|
|--37:EXCHANGE [HASH(campaign,carrier,a.market,filename)]
| | hosts=10 per-host-mem=0B
| | tuple-ids=1,3 row-size=281B cardinality=33804
| |
| 05:HASH JOIN [INNER JOIN, BROADCAST]
| | hash predicates: a.market = market
| | hosts=10 per-host-mem=17.62KB
| | tuple-ids=1,3 row-size=281B cardinality=33804
| |
| |--35:EXCHANGE [BROADCAST]
| | | hosts=5 per-host-mem=0B
| | | tuple-ids=3 row-size=29B cardinality=569
| | |
| | 34:AGGREGATE [FINALIZE]
| | | group by: market
| | | hosts=5 per-host-mem=10.00MB
| | | tuple-ids=3 row-size=29B cardinality=569
| | |
| | 33:EXCHANGE [HASH(market)]
| | | hosts=5 per-host-mem=0B
| | | tuple-ids=3 row-size=29B cardinality=569
| | |
| | 04:AGGREGATE [STREAMING]
| | | group by: market
| | | hosts=5 per-host-mem=10.00MB
| | | tuple-ids=3 row-size=29B cardinality=569
| | |
| | 03:SCAN HDFS [mobistat.allstats_packet, RANDOM]
| | partitions=1/1 files=6 size=9.32MB
| | predicates: bbdo_approved = 1, campaign = '17D1'
| | table stats: 51137 rows total
| | column stats: all
| | hosts=5 per-host-mem=32.00MB
| | tuple-ids=2 row-size=53B cardinality=1065
| |
| 02:SCAN HDFS [mobistat.cdr_packet a, RANDOM]
| partitions=4680/10328 files=4681 size=4.70GB
| predicates: regexp_like(calldirection, 'HTTPDL_CAPACITY') = TRUE, regexp_like(endresult, 'HTTP SUCCESS') = TRUE, (modpctlte + isnull(modpctlteca, 0)) > 0.999
| table stats: 58699689 rows total
| column stats: all
| hosts=10 per-host-mem=304.00MB
| tuple-ids=1 row-size=252B cardinality=2888511
|
36:EXCHANGE [HASH(a.campaign,a.carrier,a.market,a.filename)]
| hosts=10 per-host-mem=0B
| tuple-ids=0 row-size=207B cardinality=23574499
|
01:SCAN HDFS [mobistat.psr_packet_gsm a, RANDOM]
partitions=2336/6581 files=2336 size=48.59GB
predicates: regexp_like(SESSIONTYPE, 'HTTPDL_CAPACITY') = TRUE, CASE WHEN l_pdschbytes_scc3 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc2 > 0 THEN 'LTECA' WHEN l_pdschbytes_scc1 > 0 THEN 'LTECA' WHEN L_pdschbytes > 0 THEN 'LTE' ELSE NULL END IS NOT NULL
runtime filters: RF000 -> a.campaign, RF001 -> a.carrier
table stats: 636668107 rows total
column stats: all
hosts=10 per-host-mem=608.00MB
tuple-ids=0 row-size=207B cardinality=23574499

avatar
Rising Star
Tim - See the next post for the TEXT PLAN. Please let me know, if you figure whats causing the error message.

avatar

That query probably has multiple big joins and aggregations and needs more memory to complete. A very rough rule of thumb for minimum memory in releases CDH5.9-CDH5.12 is the following.

 

  1. For each hash join, the minimum of 150MB or the amount of data on the right side of the node (e.g. if you have a few thousand rows on the right side, maybe a MB or two).
  2. For each merge aggregation, the minimum of 300MB or the size of grouped data in-memory (e.g. if you only have a few thousand groups, maybe a MB or two).
  3. For each sort, about 50-60MB
  4. For each analytic, about 20MB

If you add all those up and add another 25% you'll get a ballpark number for how much memory the query will require to execute.

 

I'm working on reducing those numbers and making the system give a clearer yes/no answer on whether it can run the query before it starts executing.