Member since
12-02-2015
1
Post
0
Kudos Received
0
Solutions
12-02-2015
09:25 AM
I am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error: ``` insert into flattened_impressions partition(project) select * from flattened_impressions limit 100000000; ``` I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below: ``` Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977) Memory Limit Exceeded Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB RequestPool=root.imacalinao: Consumption=618.70 MB Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB EXCHANGE_NODE (id=1): Consumption=0 DataStreamRecvr: Consumption=38.53 MB HdfsTableSink: Consumption=512.37 MB Block Manager: Limit=819.20 MB Consumption=0 Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB HDFS_SCAN_NODE (id=0): Consumption=67.21 MB DataStreamSender: Consumption=4.00 KB WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions Memory Limit Exceeded Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB RequestPool=root.imacalinao: Consumption=618.70 MB Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB EXCHANGE_NODE (id=1): Consumption=0 DataStreamRecvr: Consumption=38.53 MB HdfsTableSink: Consumption=512.37 MB Block Manager: Limit=819.20 MB Consumption=0 Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB HDFS_SCAN_NODE (id=0): Consumption=67.21 MB DataStreamSender: Consumption=4.00 KB WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions ``` Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb. The EXPLAIN tells me I need much less: ``` Estimated Per-Host Requirements: Memory=576.00MB VCores=1 WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)] | partitions=173 | 01:EXCHANGE [UNPARTITIONED] | limit: 100000000 | 00:SCAN HDFS [ravenfish.flattened_impressions] partitions=173/173 files=977 size=2.62GB limit: 100000000 ``` I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4. Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue. Thanks! Regards, Ian
... View more
Labels: