Reply
New Contributor
Posts: 2
Registered: ‎01-06-2018

Hive query failing because container is using memory beyond limits

Hi All,

 

We need some help on a error that we face very rare in a job running on hive through beeline . Please suggest the reason for failure and how we can overcome it.

 

Error Message: “Container  is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 3.6 GB of 4.2 GB virtual memory used. Killing container.”

 

Now when the job gets failed after we rerun the job the load is getting successfull.

We are trying to load data of around 26 million after doing consolidation between delta records of 751721 and full records of 25774647.

In both the successfull and unsuccessfull run the number of records processed is same. Query which is running is below:

 

SET mapreduce.reduce.memory.mb=4096;

SET mapreduce.reduce.java.opts=-Xmx3072M;

set hive.auto.convert.join=false;

drop table if exists A;

create external table A (column_names <DATATYPES>); ---- Total 180 columns.

 

insert into table A as select  * from (select c.*, row_number() over (partition by c.<primary_key_columns> order by cast(c.SEQUENCE_NUMBER as bigint) desc) as rnk from (select * from delta_table union all select a.* from full_table a where year >= '2017' ) c )d where d.rnk=1;

 

SEQUENCE_NUMBER  is unique for each record. So to avoid multiple records of same primary key combination and to get latest updates we are identifying the latest updates using max(SEQUENCE_NUMBER ) which is comming from source.

Announcements