Support Questions

Find answers, ask questions, and share your expertise

Who agreed with this topic

"Memory Limit Exceeded" error on Impala when inserting into a partitioned table

avatar
New Contributor

am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error:

 

```

insert into flattened_impressions

  partition(project)

select * from flattened_impressions limit 100000000;

```

 

I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below:

 

```

Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977)

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

```

 

Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb.

 

The EXPLAIN tells me I need much less:

 

```

Estimated Per-Host Requirements: Memory=576.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions

WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)]
| partitions=173
|
01:EXCHANGE [UNPARTITIONED]
| limit: 100000000
|
00:SCAN HDFS [ravenfish.flattened_impressions]
partitions=173/173 files=977 size=2.62GB
limit: 100000000

```

 

I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4.

 

Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue.

 

Thanks!

 

Regards,

Ian

Who agreed with this topic