Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Unable to insert into a dynamic partition parquet table

avatar
New Contributor

Hi

 

I am trying to insert overwrite data from an unpartitoned text table to a dynamic partition parquet table , I come across multiple issues.

 

issue one : java heap size issue , when I set below propeties , the java heap size issue goes away, but the containers are getting killed.

 

10 mappers are starting and 9 of them complete in less than 10 secs, one mapper runs for around 30 mins , simultaneously reducers start . But the mapper attemt gets killed with

 

" Container preempted by scheduler " and at the same time reducers are getting killed with

 

Reducer preempted to make room for pending map attempts Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

 

 

 

SET mapred.child.java.opts=-Xmx4g;
 SET mapred.map.child.java.opts=-Xmx4g;
 SET mapred.reduce.child.java.opts=-Xmx4g;

set mapreduce.map.memory.mb=6000;
set mapreduce.reduce.memory.mb=6000 ;

set mapreduce.map.java.opts=-Xmx4g;
set mapreduce.reduce.java.opts=-Xmx4g;

set hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=1000;

set hive.exec.parallel=true;
SET hive.exec.dynamic.partition = true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.mapred.mode=nonstrict;

 

 

multiple insert queries tried:

 

 

q1 :  from ( select col1 , col2 ... partitioncol from text_table DISTRIBUTE BY partition_col )T insert overwrite table parquet_table partition (partition_col) select col1, col2 ;

 

q2 :   insert overwrite table parquet_table partition (partition_col) select col1, cl2, partition_col from text_table;

 

 

Table info avaible :

 

text table has 111 partition values

 

Num of rows : 727921424

 

Size : 67.5 G

 

Version : CDH-5.4.5

 

 

 

 

Let me know the best approach . Looking forward

 

 

Thank you

Good Day

Rohitha

Who agreed with this topic