Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does insert using MR (Mapreduce) in hive partitoned table (ORC) works better than Tez?

Does insert using MR (Mapreduce) in hive partitoned table (ORC) works better than Tez?

New Contributor

We are having a hive table in which daily data needs to be inserted. Incoming data file usually used to be size of around 150 GB. By default it was set to use TEZ as the execution engine and it used to take around 7-8 hrs to load the file completely.

But when we switched execution engine to MR and also enabled parallel processing, it was super fast and it is taking around 30-45 minutes to load same data.

Can anybody helps us to understand why is it so?

Following are the configurations used:

Configuration before i.e. using TEZ

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.auto.convert.join.noconditionaltask.size=340000000;
set hive.tez.container.size=1024;
set tez.runtime.io.sort.mb=410;
set hive.execution.engine=tez;
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.tezfiles=true;
set hive.merge.smallfiles.avgsize=160000000;   

 

Configuration after i.e. using MR (Map Reduce)

SET hive.execution.engine=mr;
set hive.exec.parallel=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.merge.tezfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=102400000000;

Don't have an account?
Coming from Hortonworks? Activate your account here