Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do you force the number of reducers in a map reduce job to be higher?

Highlighted

Re: How do you force the number of reducers in a map reduce job to be higher?

also add the distribute by clause as I wrote below otherwise each reducer will write to 1173 partitions which guarantees OOM exceptions. ( ORC keeps some memory for every task. )

Also really no idea why he distributes by the second column ( _col1 ) He shouldn't add a reducer to a simple SELECT * from. Which column is that DT? Then everything is ok.

Highlighted

Re: How do you force the number of reducers in a map reduce job to be higher?

Guru

From your example, you seem to be using Tez. Check this article https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.... which has more detail on how reducers can be controlled.

This is different from how it works in mapreduce. hive.exec.reducers.byte.per.reducer specifies who much goes to each reducer which determines number of reducers.

Don't have an account?
Coming from Hortonworks? Activate your account here