Support Questions
Find answers, ask questions, and share your expertise

"Memory Limit Exceeded" error on Impala when inserting into a partitioned table

New Contributor

am trying to run the following query on Impala and I'm getting a "Memory Limit Exceeded" error:

 

```

insert into flattened_impressions

  partition(project)

select * from flattened_impressions limit 100000000;

```

 

I’ve tried prepending `SET MEM_LIMIT=16g` to no avail. The memory per box does not change, as shown below:

 

```

Query 6d4f783fcda5ee50:1ed49f2c6d07aea1: 5% Complete (57 out of 977)

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

Memory Limit Exceeded

Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB

  RequestPool=root.imacalinao: Consumption=618.70 MB

    Query(6d4f783fcda5ee50:1ed49f2c6d07aea1) Limit: Limit=16.00 GB Consumption=618.70 MB

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea2: Consumption=550.91 MB

        EXCHANGE_NODE (id=1): Consumption=0

        DataStreamRecvr: Consumption=38.53 MB

        HdfsTableSink: Consumption=512.37 MB

      Block Manager: Limit=819.20 MB Consumption=0

      Fragment 6d4f783fcda5ee50:1ed49f2c6d07aea6: Consumption=67.79 MB

        HDFS_SCAN_NODE (id=0): Consumption=67.21 MB

        DataStreamSender: Consumption=4.00 KB

WARNING: The following tables are missing relevant table and/or column statistics.

ravenfish.flattened_impressions

```

 

Note that "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB”, despite me asking for 16gb.

 

The EXPLAIN tells me I need much less:

 

```

Estimated Per-Host Requirements: Memory=576.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
ravenfish.flattened_impressions

WRITE TO HDFS [ravenfish.flattened_impressions, OVERWRITE=false, PARTITION-KEYS=(ravenfish.flattened_impressions.project)]
| partitions=173
|
01:EXCHANGE [UNPARTITIONED]
| limit: 100000000
|
00:SCAN HDFS [ravenfish.flattened_impressions]
partitions=173/173 files=977 size=2.62GB
limit: 100000000

```

 

I'm trying to duplicate the data that already exists in the tables. Is there a different way of doing this? I am on CDH 5.4.

 

Also, the parquet files generated by Impala are less than 1MB. Is there a way of making sure the parquet files are larger? This may be related to the issue.

 

Thanks!

 

Regards,

Ian

12 REPLIES 12

It looks like your Impala's process memory limit is set to 1GB. Queries can't use more than the process memory limit, even if the query memory limit is set higher. To work around this you or your administrator need to restart impala with a higher process memory limit. 1GB is very low and you will run out of memory on many queries.

 

The process memory limit is set when impala is started with the -mem_limit option to impalad. The default is 80% of the machine's physical memory. The valid options are described in impalad --help

 

-mem_limit (Process memory limit specified as number of bytes
('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'), or
percentage of the physical memory ('<int>%'). Defaults to bytes if no
unit is given) type: string default: "80%"

 

 

Explorer

Users are getting *non-deterministic* "Memory limit exceeded" for impala queries. Impala Daemon Memory Limit: 100 GB. spill to disk is enabled 

 

However, a query failed with the above memory with Aggregate Peak Memory Usage: 125 MiB

 

Explored query profile by CM -> Impala -> Queries -> {failed query "oom=true AND stats_missing=false"}

 

Want help in narrowing down the cause of the failure: inaccurate stats, congestion, hdfs disk rebalancing, or  something else?

 

Where can I find the detailed of the failure? /var/log/impalad and catalogd state the "Query ID" but not the failure details. For example, impala logs stated the Query ID only: a24fb2eae077b513:45c8d35936a35e9e

 

impalad.w0575oslphcda11.bell.corp.bce.ca.impala.log.INFO.20170124-031735.13521:I0124 03:57:06.889070 44232 plan-fragment-executor.cc:92] Prepare(): query_id=a24fb2eae077b513:45c8d35936a35e93 instance_id=a24fb2eae077b513:45c8d35936a35e9e

 

 

I think the key question is why the effective process memory limit is 1GB. In the error you pasted above it says the process limit is 1.00GB: "Process: memory limit exceeded. Limit=1.00 GB Consumption=1.15 GB"

 

You can look at the memory limits and consumption on an Impala daemon's debug page at http://hostname:25000/memz?detailed=true - that may help identify any problematic Impala daemons.

 

Even if you're setting the memory limit to 100GB, then on startup the Impala daemon does some additional checks for the amount of memory available. You can look for a couple of things in the logs:

 

1. If vm overcommit and swapping are both disabled you may see a message in the WARNING and INFO logs along the lines of "This system shows a discrepancy between the available memory and the memory commit limit ..." that will explain what the effective memory limit is

2. Impala's view of the amount of physical memory available, e.g. "I0208 14:54:02.425094  4482 init.cc:220] Physical Memory: 31.33 GB"

There may also be more details about the "memory limit exceeded" error in the log on a different Impala daemon where it did run out of memory.

 

I should also mention that we've been making a big push to improve the memory consumption and general performance of partitioned inserts - currently with dynamic partitioning memory consumption is very high with a large number of partitions. See https://issues.cloudera.org/browse/IMPALA-2522

New Contributor

Hi, 

 

I am unable to implement your suggestions. Pl can you elaborate. How do i increase memory of  my impala daemon from 256 MB. mem_limit at startup did not work. My queries need more memory. My computer has 10GB of Ram overall. 

Expert Contributor

@adi91 - How did you set --mem_limit? What value did you pass to it? What did http://hostname:25000/memz?detailed=true say after applying --mem_limit to the command line options? Did your value show up there?

New Contributor

19.PNG

 

 

18.PNG

 

 

 I gave         impalad --mem_limit '1 gG'  as my command 

New Contributor

Hi @Lars Volker @Tim Armstrong 

 

when I run the impalad- mem_limt to 1gb(6gb etc), I get the error as shown in the image. 22.PNG

Champion

go inside the IMPALA -SHELL 

 

excute impala-shell  in the terminal

and then once you hit the terminal . 

fire the command inside the shell 

[localhost:21000] > set mem_limit=6g;

let me know if that works for you

 

 

 

New Contributor

I tried this too. It did not work because impala was allocated only 256MB of storage. However I solved the memory issue via cloudera manager.  Impala>Configuration> search memory > change from 256 MB to whatever required. This triggers the right code to be employed in the shell command for future reference. Thanks for the reply though. 

New Contributor

Hello, what are the risks of applying this solution? Can other querys with partitioned tables lose performance?

I'm not sure there are risks specifically. The best practice is to use Cloudera manager to configure memory limits for different services, so this is the right way to configure things.

 

Cloudera Manager does have support to help set up memory limits for applications: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_autoconfig.html#concept_xjy_vb...

 

For a production system, it's important to put thought into how much memory your system needs and how it's allocated between different services.  E.g. as an earlier poster saw, 256MB is not enough memory to do much interesting with Impala.

 

 

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.