Reply
Highlighted
Explorer
Posts: 19
Registered: ‎06-23-2014

Best practices for "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces)?

I've read conflicting advice for the correct value of "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces) parameter in Yarn?

 
Cloudera Manger's default is listed as "1" - but other documentation claims that this value should be set to "99% of reduce capacity." - which, in the case of a 100 node cluster, might be 99. 
 
What is the recommended value for this parameter, on a busy cluster with many jobs running?
Explorer
Posts: 19
Registered: ‎06-23-2014

Re: Best practices for "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces)?

I think the best answer to this question is the following, by Allen Wittenauer from LinkedIn:

 

http://qr.ae/7GNMu9

 

He writes:

 

 

At LinkedIn (company), I tend to tell users that their ideal reducers should be the optimal value that gets them closest to:
A multiple of the block size
A task time between 5 and 15 minutes
Creates the fewest files possible

 

 

Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.