Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reducer Parameter on TEZ

Reducer Parameter on TEZ

New Contributor

Hi Team,

Basically , Reducer will start once 85% of mapper got completed, Is their any option to start the reducer untill all the Mappers got completed.

Do we have any option to set that parameter in TEZ as well.

14 REPLIES 14

Re: Reducer Parameter on TEZ

@suresh krish

you can tweek the value of mapred.reduce.slowstart.completed.maps to start reduce early.

Re: Reducer Parameter on TEZ

New Contributor

Thanks for your respose.. So if reduce the value.. reduce will not start untill all the mappers get completed right.. Any idea what value should be fine to stop the reducer untill all the mappers finish

Highlighted

Re: Reducer Parameter on TEZ

The setting is for percentage of mappers that have to finish before a reducer is started so mapred.reduce.slowstart.completed.maps=1.0 will wait till all maps are finished.

Re: Reducer Parameter on TEZ

Rising Star

@suresh krish

You need to set mapred.reduce.slowstart.completed.maps in mapred-site.xml (Percentage base)

If you need reducers to start only after completion of all map tasks you need to set mapred.reduce.slowstart.completed.maps=1.0

Idle setting would be mapred.reduce.slowstart.completed.maps=0.8 (or 0.9) -> reducers to start only after 80% (90% respectively) of map tasks got completed.

In latest version of hadoop (hdp2.4.1) the param name is changed to mapreduce.job.reduce.slowstart.completedmaps

Also You can set this param on per Job basis.

Re: Reducer Parameter on TEZ

New Contributor

THe parameter is working for MR but not working in TEZ. Do we have a anyother parameter for TEZ ?

Re: Reducer Parameter on TEZ

New Contributor

THe parameter is working for MR but not working in TEZ. Do we have a anyother parameter for TEZ ?

Re: Reducer Parameter on TEZ

You are actually correct Tez has two other parameters:

  1. tez.shuffle-vertex-manager.min-src-fraction=0.25;
  2. tez.shuffle-vertex-manager.max-src-fraction=0.75;

So I suppose if you set both to 1.0 it should have the same effect.

Now you should be a bit careful with that. Tez on Hive does some magic like keeping containers around in case they are needed later on (tez.am.container.idle.release-timeout-min.millis) so changing that parameter might just mean that some containers are idle for a while.

https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer....

Re: Reducer Parameter on TEZ

New Contributor

Hi Benjamin, Making reducer to start after the mapper completes 100% will give any performance improvement ? Or is this is a BEST PRACTICE ? Can you please suggest

Re: Reducer Parameter on TEZ

The reason you have the early start is because Reducers can start copying over data from already finished map tasks while the remainder of map tasks finish. So best practice is to have them gradually ramped up as is the default. This will make the query finish faster. That is why these parameters exist.

It will not impact the existing job since map tasks are allocated first.

However you might impact OTHER tasks because more tasks are running. So I have disabled it in situations of high concurrency where I wanted the highest possible throughput for all queries. However it depends on your query tez will hang on to containers anyhow for 10seconds so as long as your mappers do not take too long you will not get much benefit. It might be different for very long running mapper/reducers/

That is the reason I don't like the "What is best practice " questions. The answer is always it depends on your concrete situation and queries.