Reply
New Contributor
Posts: 2
Registered: ‎10-03-2015

Unable to insert into a dynamic partition parquet table

Hi

 

I am trying to insert overwrite data from an unpartitoned text table to a dynamic partition parquet table , I come across multiple issues.

 

issue one : java heap size issue , when I set below propeties , the java heap size issue goes away, but the containers are getting killed.

 

10 mappers are starting and 9 of them complete in less than 10 secs, one mapper runs for around 30 mins , simultaneously reducers start . But the mapper attemt gets killed with

 

" Container preempted by scheduler " and at the same time reducers are getting killed with

 

Reducer preempted to make room for pending map attempts Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

 

 

 

SET mapred.child.java.opts=-Xmx4g;
 SET mapred.map.child.java.opts=-Xmx4g;
 SET mapred.reduce.child.java.opts=-Xmx4g;

set mapreduce.map.memory.mb=6000;
set mapreduce.reduce.memory.mb=6000 ;

set mapreduce.map.java.opts=-Xmx4g;
set mapreduce.reduce.java.opts=-Xmx4g;

set hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=1000;

set hive.exec.parallel=true;
SET hive.exec.dynamic.partition = true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.mapred.mode=nonstrict;

 

 

multiple insert queries tried:

 

 

q1 :  from ( select col1 , col2 ... partitioncol from text_table DISTRIBUTE BY partition_col )T insert overwrite table parquet_table partition (partition_col) select col1, col2 ;

 

q2 :   insert overwrite table parquet_table partition (partition_col) select col1, cl2, partition_col from text_table;

 

 

Table info avaible :

 

text table has 111 partition values

 

Num of rows : 727921424

 

Size : 67.5 G

 

Version : CDH-5.4.5

 

 

 

 

Let me know the best approach . Looking forward

 

 

Thank you

Good Day

Rohitha

Contributor
Posts: 31
Registered: ‎06-26-2015

Re: Unable to insert into a dynamic partition parquet table

for me this configuration worked

 


SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET parquet.compression=snappy;
SET mapreduce.map.java.opts=-Xmx15g;
SET mapreduce.map.memory.mb=20000;
SET hive.exec.max.dynamic.partitions.pernode=10000;
SET mapred.max.split.size=256000000;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 

the data size was large so i had to give large heap space, and split data using WHERE condition and run multiple times.

hope this ehlps

New Contributor
Posts: 2
Registered: ‎10-03-2015

Re: Unable to insert into a dynamic partition parquet table

Thank you Ben, A combination of properties worked for me.

Highlighted
New Contributor
Posts: 3
Registered: ‎04-21-2016

Re: Unable to insert into a dynamic partition parquet table

Hi Rohitha,

 

can you please share the properties which worked for you(I know its too late but if you remember please share - thanks).

 

Thanks.

Announcements