Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Memory limit exceeded for Impala job

Highlighted

Memory limit exceeded for Impala job

New Contributor

 Query: INSERT INTO data_lake.asrasr_sd6 PARTITION (year, month, day) [SHUFFLE] SELECT *, year(calendar_timestamp) as year, month(calendar_timestamp) as month, day(calendar_timestamp) as day FROM data_lake_tmp.asrasr_sd6.
at com.cloudera.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.cloudera.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.cloudera.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.cloudera.jdbc.common.SStatement.execute(Unknown Source)
at org.pentaho.di.core.database.Database.execStatement(Database.java:1533)
at org.pentaho.di.core.database.Database.execStatements(Database.java:1660)
at org.pentaho.di.trans.steps.sql.ExecSQL.processRow(ExecSQL.java:211)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
Caused by: com.cloudera.support.exceptions.GeneralException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: ERROR_STATE, SQL state:
Memory limit exceeded

2 REPLIES 2
Highlighted

Re: Memory limit exceeded for Impala job

New Contributor

Can some one help me on this issue

Re: Memory limit exceeded for Impala job

Master Collaborator

It would be helpful if you provided a profile or query plan - it's hard to provide accurate advice otherwise.

 

My guess based on the limited information is that you are inserting into a large number of partitions and that the query is running out of memory because it is buffering all those partitions in memory at once.

 

It could also easily be concurrent queries or a low memory limit for the query, but I am making an educated guess based on past experience of what I've seen other people run into.

 

There would be two ways to solve that:

* Reduce the number of partitions that you're inserting into with a single query (change the table's partitioning, break up the insert into multiple queries, each for a subset of partitions).

* Add a CLUSTERED hint to the insert so that it sorts the data and then inserts into a partition at a time. This reduces the memory requirements drastically.

Don't have an account?
Coming from Hortonworks? Activate your account here