Created on 12-02-2015 09:25 AM - edited 09-16-2022 02:50 AM
I am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error:
```
insert into flattened_impressions
partition(project)
select * from flattened_impressions limit 100000000;
```
I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below:
```
Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977)
Memory Limit Exceeded
Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB
RequestPool=root.imacalinao: Consumption=618.70 MB
Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB
Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB
EXCHANGE_NODE (id=1): Consumption=0
DataStreamRecvr: Consumption=38.53 MB
HdfsTableSink: Consumption=512.37 MB
Block Manager: Limit=819.20 MB Consumption=0
Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB
HDFS_SCAN_NODE (id=0): Consumption=67.21 MB
DataStreamSender: Consumption=4.00 KB
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions
Memory Limit Exceeded
Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB
RequestPool=root.imacalinao: Consumption=618.70 MB
Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB
Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB
EXCHANGE_NODE (id=1): Consumption=0
DataStreamRecvr: Consumption=38.53 MB
HdfsTableSink: Consumption=512.37 MB
Block Manager: Limit=819.20 MB Consumption=0
Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB
HDFS_SCAN_NODE (id=0): Consumption=67.21 MB
DataStreamSender: Consumption=4.00 KB
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions
```
Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb.
The EXPLAIN tells me I need much less:
```
Estimated Per-Host Requirements: Memory=576.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions
WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)]
| partitions=173
|
01:EXCHANGE [UNPARTITIONED]
| limit: 100000000
|
00:SCAN HDFS [ravenfish.flattened_impressions]
partitions=173/173 files=977 size=2.62GB
limit: 100000000
```
I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4.
Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue.
Thanks!
Regards,
Ian