Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎03-03-2016

INSERT ... SELECT fails on memory limit exceeded

[ Edited ]

I'm exploring Impala by using a CDH VM.

I'm trying to convert my tsv data to parquet format, but constantly fails on memory limit.

There are no other queries in parallal.

 

 

explain insert overwrite  frecordsParquet select * from frecords;
Query: explain insert overwrite  frecordsParquet select * from frecords
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=1.05GB VCores=1                            |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.frecords                                                                   |
|                                                                                    |
| WRITE TO HDFS [default.frecordsparquet, OVERWRITE=true]                            |
| |  partitions=1                                                                    |
| |                                                                                  |
| 00:SCAN HDFS [default.frecords]                                                    |
|    partitions=1/1 files=56123 size=15.43GB                                         |
+------------------------------------------------------------------------------------+

 

I tried to set PARQUET_FILE_SIZE to 16m, hoping that Impala will use less memory but it didn't help.

 

Memory Limit Exceeded
Process: memory limit exceeded. Limit=2.99 GB Consumption=3.74 GB
  RequestPool=default-pool: Consumption=86.91 MB
    Query(c64edf272a4e263c:5f481efc69624fba) Limit: Consumption=86.91 MB
      Fragment c64edf272a4e263c:5f481efc69624fbb: Consumption=86.91 MB
        HDFS_SCAN_NODE (id=0): Consumption=37.90 MB
        HdfsTableSink: Consumption=47.93 MB
      Block Manager: Limit=2.39 GB Consumption=0
WARNING: The following tables are missing relevant table and/or column statistics.
default.frecords

All I want is that Impala will read some data,  write it to Parquet and then will continue to read more data.

What's the best way to acheive this?

thanks

Cloudera Employee
Posts: 368
Registered: ‎07-29-2015

Re: INSERT ... SELECT fails on memory limit exceeded

You may be running into https://issues.cloudera.org/browse/IMPALA-2940 that affects queries that scan a lot of parquet data. We've fixed the issue and the fix will appear in Impala 2.3.4 and 2.5.0.

 

A workaround is to break up the large query into multiple small queries that process a subset of the insert.

 

I would also strongly recommend giving more memory to Impala, you'll run into memory limits fairly frequently with only 3GB of memory allocated to Impala.

New Contributor
Posts: 2
Registered: ‎03-03-2016

Re: INSERT ... SELECT fails on memory limit exceeded

 

I upgraded Impala but I still encounter the same situation. 

 

Any more suggestions?

 

I tried also to run compute stats but after a while it fails as well.

Announcements