Member since
04-18-2016
4
Posts
3
Kudos Received
0
Solutions
04-26-2016
06:02 PM
@Kevin Sievers Hi Kevin, your commands look good to me, somehow he does not take the number of reduce tasks though. You are right Hadoop should be MUCH faster. But the one reduce task and even weirder one mapper seem to be the problem And I assure you it runs with a lot of mappers and 40 reducers and is loading and transforming around 300 GB of data in 20 minutes on an 7 datanode cluster. So basically I have NO idea why he does only one mapper, I have no idea why he has the second Reducer AT ALL. I have no idea why he ignores the mapred.reduce.tasks parameter? I think a support ticket might be in order. set hive.tez.java.opts = "-Xmx3600m";
set hive.tez.container.size = 4096;
set mapred.reduce.tasks=120;
CREATE EXTERNAL TABLE STAGING ...
...
insert into TABLE TARGET partition (day = 20150811) SELECT * FROM STAGING distribute by DT ;
... View more