Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

"Memory Limit Exceeded" error on Impala when inserting into a partitioned table

avatar
New Contributor

am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error:

 

```

insert into flattened_impressions

  partition(project)

select * from flattened_impressions limit 100000000;

```

 

I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below:

 

```

Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977)

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

```

 

Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb.

 

The EXPLAIN tells me I need much less:

 

```

Estimated Per-Host Requirements: Memory=576.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions

WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)]
| partitions=173
|
01:EXCHANGE [UNPARTITIONED]
| limit: 100000000
|
00:SCAN HDFS [ravenfish.flattened_impressions]
partitions=173/173 files=977 size=2.62GB
limit: 100000000

```

 

I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4.

 

Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue.

 

Thanks!

 

Regards,

Ian

Who agreed with this topic