- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do you force the number of reducers in a map reduce job to be higher?
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎04-18-2016 04:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm attempting to copy 30 billion rows from one hive table into another hive table. The tables are both created the same and are partitioned on date (DT). Currently there are 1173 partitions. I'm using the following query: insert into accesslog_new PARTITION (DT) select * from accesslog;
This query has been running for almost 3 days straight on a cluster with 18 data nodes.
My issue is that the Map-Reduce job only creates one reducer step. Btw, we are using MR2. I'm guessing this is drastically slowing things down. Is there a way to force the number of reducers to be much larger? How do you also figure out what an appropriate number of reducers would be for that volume of data?
Created ‎04-18-2016 07:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
also add the distribute by clause as I wrote below otherwise each reducer will write to 1173 partitions which guarantees OOM exceptions. ( ORC keeps some memory for every task. )
Also really no idea why he distributes by the second column ( _col1 ) He shouldn't add a reducer to a simple SELECT * from. Which column is that DT? Then everything is ok.
Created ‎04-20-2016 03:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From your example, you seem to be using Tez. Check this article https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.... which has more detail on how reducers can be controlled.
This is different from how it works in mapreduce. hive.exec.reducers.byte.per.reducer specifies who much goes to each reducer which determines number of reducers.

- « Previous
-
- 1
- 2
- Next »