Query: INSERT INTO data_lake.asrasr_sd6 PARTITION (year, month, day) [SHUFFLE] SELECT *, year(calendar_timestamp) as year, month(calendar_timestamp) as month, day(calendar_timestamp) as day FROM data_lake_tmp.asrasr_sd6.
at com.cloudera.hivecommon.api.HS2Client.executeStatementInternal(Unknown Source)
at com.cloudera.hivecommon.api.HS2Client.executeStatement(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown Source)
at com.cloudera.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown Source)
at com.cloudera.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.cloudera.jdbc.common.SStatement.execute(Unknown Source)
Caused by: com.cloudera.support.exceptions.GeneralException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: ERROR_STATE, SQL state:
Memory limit exceeded
It would be helpful if you provided a profile or query plan - it's hard to provide accurate advice otherwise.
My guess based on the limited information is that you are inserting into a large number of partitions and that the query is running out of memory because it is buffering all those partitions in memory at once.
It could also easily be concurrent queries or a low memory limit for the query, but I am making an educated guess based on past experience of what I've seen other people run into.
There would be two ways to solve that:
* Reduce the number of partitions that you're inserting into with a single query (change the table's partitioning, break up the insert into multiple queries, each for a subset of partitions).
* Add a CLUSTERED hint to the insert so that it sorts the data and then inserts into a partition at a time. This reduces the memory requirements drastically.