About simplyianm

simplyianm · ‎12-02-2015

I am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error: ``` insert into flattened_impressions partition(project) select * from flattened_impressions limit 100000000; ``` I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below: ``` Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977) Memory Limit Exceeded Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB RequestPool=root.imacalinao: Consumption=618.70 MB Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB EXCHANGE_NODE (id=1): Consumption=0 DataStreamRecvr: Consumption=38.53 MB HdfsTableSink: Consumption=512.37 MB Block Manager: Limit=819.20 MB Consumption=0 Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB HDFS_SCAN_NODE (id=0): Consumption=67.21 MB DataStreamSender: Consumption=4.00 KB WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions Memory Limit Exceeded Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB RequestPool=root.imacalinao: Consumption=618.70 MB Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB EXCHANGE_NODE (id=1): Consumption=0 DataStreamRecvr: Consumption=38.53 MB HdfsTableSink: Consumption=512.37 MB Block Manager: Limit=819.20 MB Consumption=0 Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB HDFS_SCAN_NODE (id=0): Consumption=67.21 MB DataStreamSender: Consumption=4.00 KB WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions ``` Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb. The EXPLAIN tells me I need much less: ``` Estimated Per-Host Requirements: Memory=576.00MB VCores=1 WARNING: The following tables are missing relevant table and/or column statistics. ravenfish.flattened_impressions WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)] | partitions=173 | 01:EXCHANGE [UNPARTITIONED] | limit: 100000000 | 00:SCAN HDFS [ravenfish.flattened_impressions] partitions=173/173 files=977 size=2.62GB limit: 100000000 ``` I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4. Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue. Thanks! Regards, Ian

Online	Offline
Last Visited	‎12-02-2015 06:01 PM

Member Since	‎12-02-2015 09:21 AM
Last Visited	‎12-02-2015 06:01 PM
Posts	1

Cloudera Community

"Memory Limit Exceeded" error on Impala when inser...