Support Questions

Find answers, ask questions, and share your expertise

Backend 6:Memory Limit Exceeded" in impala 2 (chd 5.2) but same query is working fine in impala 1.1

avatar
Explorer

Hi, I have 2 node (Namenode + 2 Datanode) setup for Hadoop (CDH 5.2) , my servers are centos 6.5 64 bit (2 core with 4GB RAM). I also have one more cluster (CDH 4.2) having similar configuration.

 

Now my problem is that, I have one sql query which is running flawless on CDH 4.2 cluster but giving "Backend 6:Memory Limit Exceeded" in new CDH 5.2 cluster.

 

I have also increased datanode RAM upto 8GB and set MEM_LIMIT=6G in impala, but still getting same error.

 

I have also tried "explain" statement to check memory usage by query and it is showing per node memory 2.3G, then why I am getting this error.

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

Hi Nagendra,

I replied to you over email. We are trying to understand and fix the 'Memory Limit Exceeded' errors. Can you please disable the partitioned_hash_join and partitioned_aggregation, by restarting Impala with those options "--enable_partitioned_aggregation=false --enable_partitioned_hash_join=false" and let me know if the queries run successfully?

Regards,
-Ippokratis.

View solution in original post

19 REPLIES 19

avatar
Contributor

I meant the query profile of query you run in Impala 2.0 by setting MEM_LIMIT=15G. Just post the section with the estimated/peak memory usage per node. Is it the same profile as the one you posted earlier?

 

Dimitris

avatar
Explorer

Hi Dimitris,

 

Query profile posted earlier was for failing query, I will post query profile of query with MEM_LIMIT=15G which was executed successfully.

 

Regards,

Nagendra Rai

avatar
Explorer

 

Hi Dimitris,

 

Below are the query profile details (some part of it) of query that is running successfully by increasing server memory. I have also checked PerHostPeakMemUsage requirement and it never goes beyond 1.39GB.

 

 

Estimated Per-Host Requirements: Memory=14.43GB VCores=2
WARNING: The following tables are missing relevant table and/or column statistics.
default.mst_location


Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------------------- 23:MERGING-EXCHANGE 1 331.94us 331.94us 5.20K 10.00K 0 -1.00 B UNPARTITIONED 11:TOP-N 1 12.100ms 12.100ms 5.20K 10.00K 1.00 MB 705.27 KB 22:AGGREGATE 1 255.443ms 255.443ms 5.20K 80.53M 6.38 MB 128.00 MB FINALIZE 21:EXCHANGE 1 240.795us 240.795us 5.20K 80.53M 0 0 HASH(d.ip_location,c.access... 10:AGGREGATE 1 256.762ms 256.762ms 5.20K 80.53M 6.28 MB 128.00 MB 09:HASH JOIN 1 14.843ms 14.843ms 10.04K 80.53M 4.09 MB 2.00 GB LEFT OUTER JOIN, BROADCAST |--20:EXCHANGE 1 7.663us 7.663us 11 -1 0 0 BROADCAST | 00:SCAN HDFS 1 252.704ms 252.704ms 11 -1 54.00 KB 32.00 MB default.mst_location d 19:AGGREGATE 1 9.9ms 9.9ms 10.04K 80.53M 6.38 MB 7.22 GB FINALIZE 18:EXCHANGE 1 317.121us 317.121us 10.04K 80.53M 0 0 HASH(ip_address,a.access_da... 08:AGGREGATE 1 449.591ms 449.591ms 10.04K 80.53M 13.18 MB 7.22 GB 07:HASH JOIN 1 1s008ms 1s008ms 1.39M 80.53M 7.75 MB 5.68 MB INNER JOIN, BROADCAST |--17:EXCHANGE 1 809.569us 809.569us 2.42K 80.53K 0 0 BROADCAST | 16:AGGREGATE 1 682.727ms 682.727ms 2.42K 80.53K 776.88 MB 10.00 MB FINALIZE | 15:EXCHANGE 1 25.278ms 25.278ms 413.62K 80.53K 0 0 HASH(access_date,a.usr_id,cnt) | 06:AGGREGATE 1 543.432ms 543.432ms 413.62K 80.53K 776.88 MB 10.00 MB | 14:AGGREGATE 1 335.863ms 335.863ms 416.05K 80.53K 776.88 MB 128.00 MB FINALIZE | 13:EXCHANGE 1 25.230ms 25.230ms 416.05K 80.53K 0 0 HASH(a.ip_address,a.ip_loca... | 05:AGGREGATE 1 1s448ms 1s448ms 416.05K 80.53K 776.88 MB 128.00 MB | 04:HASH JOIN 1 246.11ms 246.11ms 1.63M 80.53K 21.29 MB 2.00 GB LEFT OUTER JOIN, BROADCAST | |--12:EXCHANGE 1 10.731us 10.731us 11 -1 0 0 BROADCAST | | 03:SCAN HDFS 1 241.873ms 241.873ms 11 -1 54.00 KB 32.00 MB default.mst_location A | 02:SCAN HDFS 1 27s538ms 27s538ms 1.63M 80.53K 156.00 MB 480.00 MB default.logs_sc1 B 01:SCAN HDFS 1 13s589ms 13s589ms 20.41M 80.53M 82.05 MB 352.00 MB default.logs_sc1 a Query Timeline: 49s402ms - Start execution: 67.955us (67.955us) - Planning finished: 216.759ms (216.691ms) - Submit for admission: 217.69ms (310.21us) - Completed admission: 217.79ms (10.25us) - Ready to start remote fragments: 219.667ms (2.588ms) - Remote fragments started: 2s337ms (2s117ms) - Rows available: 47s448ms (45s111ms) - First row fetched: 47s462ms (13.655ms) - Unregister query: 49s395ms (1s932ms) ImpalaServer: - ClientFetchWaitTimer: 1s934ms - RowMaterializationTimer: 10.625ms Execution Profile a34460384c0f44c8:c196fcdc2790d29d:(Total: 47s225ms, non-child: 0ns, % non-child: 0.00%) Fragment start latencies: count: 8, last: 0.232499, min: 0.212138, max: 0.295248, mean: 0.26005, stddev: 0.0247054 Per Node Peak Memory Usage: server2:22000(1.39 GB) - FinalizationTimer: 0ns Coordinator Fragment F08:(Total: 45s106ms, non-child: 2.303ms, % non-child: 0.01%) MemoryUsage(1s000ms): 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 8.00 KB, 465.84 KB, 220.01 KB ThreadUsage(500.0ms): 1, 1, 1, 1 - AverageThreadTokens: 1.00 - PeakMemoryUsage: 522.87 KB (535416) - PerHostPeakMemUsage: 0 - PrepareTime: 33.461us - RowsProduced: 5.20K (5203) - TotalCpuTime: 49s146ms - TotalNetworkReceiveTime: 0ns - TotalNetworkSendTime: 0ns - TotalStorageWaitTime: 0ns BlockMgr: - BlockWritesOutstanding: 0 - BlocksCreated: 980 - BlocksRecycled: 879 - BufferedPins: 6 - BytesWritten: 8.26 MB (8656438) - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 9.97 GB (10703688704) - PeakMemoryUsage: 1.28 GB (1372799152) - TotalBufferWaitTime: 0ns - TotalEncryptionTime: 0ns - TotalIntegrityCheckTime: 0ns - TotalReadBlockTime: 6.532ms EXCHANGE_NODE (id=23):(Total: 45s104ms, non-child: 331.94us, % non-child: 0.00%) BytesReceived(1s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 233.90 KB, 233.90 KB - BytesReceived: 233.90 KB (239518) - ConvertRowBatchTime: 0ns - DeserializeRowBatchTimer: 701.309us - FirstBatchArrivalWaitTime: 45s103ms - MergeGetNext: 300.125us - MergeGetNextBatch: 30.200us - PeakMemoryUsage: 0 - RowsReturned: 5.20K (5203) - RowsReturnedRate: 115.00 /sec - SendersBlockedTimer: 0ns - SendersBlockedTotalTimer(*): 0ns Averaged Fragment F07:(Total: 46s841ms, non-child: 0ns, % non-child: 0.00%) split sizes: min: 0, max: 0, avg: 0, stddev: 0 completion times: min:46s977ms max:46s977ms mean: 46s977ms stddev:0ns execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec num instances: 1 - AverageThreadTokens: 1.00 - PeakMemoryUsage: 6.42 MB (6728752) - PerHostPeakMemUsage: 1.39 GB (1488882840) - PrepareTime: 253.281ms - RowsProduced: 5.20K (5203) - TotalCpuTime: 661.516ms - TotalNetworkReceiveTime: 46s569ms - TotalNetworkSendTime: 2.51us - TotalStorageWaitTime: 0ns CodeGen:(Total: 638.47ms, non-child: 638.47ms, % non-child: 100.00%) - CodegenTime: 3.817ms - CompileTime: 389.131ms - LoadTime: 247.316ms - ModuleFileSize: 1.94 MB (2032752) DataStreamSender (dst_id=23):(Total: 3.668ms, non-child: 3.668ms, % non-child: 100.00%) - BytesSent: 233.90 KB (239518) - NetworkThroughput(*): 134.03 MB/sec - OverallThroughput: 62.26 MB/sec - PeakMemoryUsage: 4.00 KB (4096) - SerializeBatchTime: 1.828ms - ThriftTransmitTime(*): 1.704ms - UncompressedRowBatchSize: 478.43 KB (489914) SORT_NODE (id=11):(Total: 46s837ms, non-child: 12.100ms, % non-child: 0.03%) - PeakMemoryUsage: 1.00 MB (1052672) - RowsReturned: 5.20K (5203) - RowsReturnedRate: 111.00 /sec AGGREGATION_NODE (id=22):(Total: 46s825ms, non-child: 255.443ms, % non-child: 0.55%) - BuildTime: 1.680ms - GetNewBlockTime: 116.349us - GetResultsTime: 269.927us - HashBuckets: 32.77K (32768) - LargestPartitionPercent: 3 - MaxPartitionLevel: 0 - NumRepartitions: 0 - PartitionsCreated: 32 - PeakMemoryUsage: 6.38 MB (6690520) - PinTime: 0ns - RowsRepartitioned: 0 - RowsReturned: 5.20K (5203) - RowsReturnedRate: 111.00 /sec - SpilledPartitions: 0 - UnpinTime: 10.232us EXCHANGE_NODE (id=21):(Total: 46s569ms, non-child: 46s569ms, % non-child: 100.00%) - BytesReceived: 190.35 KB (194914) - ConvertRowBatchTime: 108.155us - DeserializeRowBatchTimer: 656.948us - FirstBatchArrivalWaitTime: 46s563ms - PeakMemoryUsage: 0 - RowsReturned: 5.20K (5203) - RowsReturnedRate: 111.00 /sec - SendersBlockedTimer: 0ns - SendersBlockedTotalTimer(*): 0ns

 Also query summery details. It has big difference between "Est. Peak Mem" and "Peak Mem"

 

[server2.kotakseconline.com:21000] > summary;
+---------------------+--------+----------+----------+---------+------------+-----------+---------------+-------------------------------------------------------------+
| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                                                      |
+---------------------+--------+----------+----------+---------+------------+-----------+---------------+-------------------------------------------------------------+
| 23:MERGING-EXCHANGE | 1      | 331.09us | 331.09us | 5.20K   | 10.00K     | 0 B       | -1 B          | UNPARTITIONED                                               |
| 11:TOP-N            | 1      | 12.10ms  | 12.10ms  | 5.20K   | 10.00K     | 1.00 MB   | 705.27 KB     |                                                             |
| 22:AGGREGATE        | 1      | 255.44ms | 255.44ms | 5.20K   | 80.53M     | 6.38 MB   | 128.00 MB     | FINALIZE                                                    |
| 21:EXCHANGE         | 1      | 240.79us | 240.79us | 5.20K   | 80.53M     | 0 B       | 0 B           | HASH(d.ip_location,c.access_date,c.usr_id)                  |
| 10:AGGREGATE        | 1      | 256.76ms | 256.76ms | 5.20K   | 80.53M     | 6.28 MB   | 128.00 MB     |                                                             |
| 09:HASH JOIN        | 1      | 14.84ms  | 14.84ms  | 10.04K  | 80.53M     | 4.09 MB   | 2.00 GB       | LEFT OUTER JOIN, BROADCAST                                  |
| |--20:EXCHANGE      | 1      | 7.66us   | 7.66us   | 11      | -1         | 0 B       | 0 B           | BROADCAST                                                   |
| |  00:SCAN HDFS     | 1      | 252.70ms | 252.70ms | 11      | -1         | 54.00 KB  | 32.00 MB      | default.mst_location d                                      |
| 19:AGGREGATE        | 1      | 9.01ms   | 9.01ms   | 10.04K  | 80.53M     | 6.38 MB   | 7.22 GB       | FINALIZE                                                    |
| 18:EXCHANGE         | 1      | 317.12us | 317.12us | 10.04K  | 80.53M     | 0 B       | 0 B           | HASH(ip_address,a.access_date,a.usr_id,a.cnt)               |
| 08:AGGREGATE        | 1      | 449.59ms | 449.59ms | 10.04K  | 80.53M     | 13.18 MB  | 7.22 GB       |                                                             |
| 07:HASH JOIN        | 1      | 1.01s    | 1.01s    | 1.39M   | 80.53M     | 7.75 MB   | 5.68 MB       | INNER JOIN, BROADCAST                                       |
| |--17:EXCHANGE      | 1      | 809.57us | 809.57us | 2.42K   | 80.53K     | 0 B       | 0 B           | BROADCAST                                                   |
| |  16:AGGREGATE     | 1      | 682.73ms | 682.73ms | 2.42K   | 80.53K     | 776.88 MB | 10.00 MB      | FINALIZE                                                    |
| |  15:EXCHANGE      | 1      | 25.28ms  | 25.28ms  | 413.62K | 80.53K     | 0 B       | 0 B           | HASH(access_date,a.usr_id,cnt)                              |
| |  06:AGGREGATE     | 1      | 543.43ms | 543.43ms | 413.62K | 80.53K     | 776.88 MB | 10.00 MB      |                                                             |
| |  14:AGGREGATE     | 1      | 335.86ms | 335.86ms | 416.05K | 80.53K     | 776.88 MB | 128.00 MB     | FINALIZE                                                    |
| |  13:EXCHANGE      | 1      | 25.23ms  | 25.23ms  | 416.05K | 80.53K     | 0 B       | 0 B           | HASH(a.ip_address,a.ip_location,access_date,b.usr_id,b.cnt) |
| |  05:AGGREGATE     | 1      | 1.45s    | 1.45s    | 416.05K | 80.53K     | 776.88 MB | 128.00 MB     |                                                             |
| |  04:HASH JOIN     | 1      | 246.01ms | 246.01ms | 1.63M   | 80.53K     | 21.29 MB  | 2.00 GB       | LEFT OUTER JOIN, BROADCAST                                  |
| |  |--12:EXCHANGE   | 1      | 10.73us  | 10.73us  | 11      | -1         | 0 B       | 0 B           | BROADCAST                                                   |
| |  |  03:SCAN HDFS  | 1      | 241.87ms | 241.87ms | 11      | -1         | 54.00 KB  | 32.00 MB      | default.mst_location A                                      |
| |  02:SCAN HDFS     | 1      | 27.54s   | 27.54s   | 1.63M   | 80.53K     | 156.00 MB | 480.00 MB     | default.logs_sc1 B                                          |
| 01:SCAN HDFS        | 1      | 13.59s   | 13.59s   | 20.41M  | 80.53M     | 82.05 MB  | 352.00 MB     | default.logs_sc1 a                                          |
+---------------------+--------+----------+----------+---------+------------+-----------+---------------+-------------------------------------------------------------+

 

Regards,

Nagendra Rai

avatar
Cloudera Employee

Hi Nagendra,

 

Can you please send me (ipandis@cloudera) and Dimitris (dtsirogiannis@cloudera) a direct email with the entire profile log (impala log.INFO) in both configurations, the one that fails when mem_limit < 14GB and the one that succeeds?

We would like to understand why this is happenning, especially whether there is data skew.

Also, can you please try to run compute stats again on the tables that are missing stats (e.g. mst_location), and send us the logs, if compute stats fails?

 

Regards,

-Ippokratis.

 

avatar
Explorer

Hi Ippokratis,

 

I have sent mail, please check.

 

Regards,

Nagendra Rai

avatar
Explorer

Still waiting for reply 😞

avatar
Cloudera Employee

Hi Nagendra,

I replied to you over email. We are trying to understand and fix the 'Memory Limit Exceeded' errors. Can you please disable the partitioned_hash_join and partitioned_aggregation, by restarting Impala with those options "--enable_partitioned_aggregation=false --enable_partitioned_hash_join=false" and let me know if the queries run successfully?

Regards,
-Ippokratis.

avatar
Explorer

Hi Ippokratis,

 

I have added "--enable_partitioned_aggregation=false --enable_partitioned_hash_join=false" in /etc/default/impala and restarted impala service, now my query is running flawless on server having 4G memory.

 

Thank you for your support.

 

Regards,

Nagendra Rai

avatar
Explorer
 

avatar

The advice in this thread is out of date: memory usage for joins and aggregations has been improved a lot in CDH5.5. 

 

Your issue is something different since the query doesn't have a join or group by (aggregation) in it.

 

The first step to understand this better is to look at the impalad logs: there is usually some information in there about why the memory limit was exceeded and what operators were consuming memory.