Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best practices for "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces)?

Highlighted

Best practices for "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces)?

I've read conflicting advice for the correct value of "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces) parameter in Yarn?

 
Cloudera Manger's default is listed as "1" - but other documentation claims that this value should be set to "99% of reduce capacity." - which, in the case of a 100 node cluster, might be 99. 
 
What is the recommended value for this parameter, on a busy cluster with many jobs running?
1 REPLY 1

Re: Best practices for "Default Number of Reduce Tasks per Job" (mapreduce.job.reduces)?

I think the best answer to this question is the following, by Allen Wittenauer from LinkedIn:

 

http://qr.ae/7GNMu9

 

He writes:

 

 

At LinkedIn (company), I tend to tell users that their ideal reducers should be the optimal value that gets them closest to:
A multiple of the block size
A task time between 5 and 15 minutes
Creates the fewest files possible