Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

"Memory Limit Exceeded" error on Impala when inserting into a partitioned table

avatar
New Contributor

am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error:

 

```

insert into flattened_impressions

  partition(project)

select * from flattened_impressions limit 100000000;

```

 

I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below:

 

```

Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977)

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

```

 

Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb.

 

The EXPLAIN tells me I need much less:

 

```

Estimated Per-Host Requirements: Memory=576.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions

WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)]
| partitions=173
|
01:EXCHANGE [UNPARTITIONED]
| limit: 100000000
|
00:SCAN HDFS [ravenfish.flattened_impressions]
partitions=173/173 files=977 size=2.62GB
limit: 100000000

```

 

I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4.

 

Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue.

 

Thanks!

 

Regards,

Ian

12 REPLIES 12

avatar
Explorer

I tried this too. It did not work because impala was allocated only 256MB of storage. However I solved the memory issue via cloudera manager.  Impala>Configuration> search memory > change from 256 MB to whatever required. This triggers the right code to be employed in the shell command for future reference. Thanks for the reply though. 

avatar
New Contributor

Hello, what are the risks of applying this solution? Can other querys with partitioned tables lose performance?

avatar

I'm not sure there are risks specifically. The best practice is to use Cloudera manager to configure memory limits for different services, so this is the right way to configure things.

 

Cloudera Manager does have support to help set up memory limits for applications: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_autoconfig.html#concept_xjy_vb...

 

For a production system, it's important to put thought into how much memory your system needs and how it's allocated between different services.  E.g. as an earlier poster saw, 256MB is not enough memory to do much interesting with Impala.