Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Distribution of key,value in mappers and Reducers

avatar

I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data.

I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here.

Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?

1 ACCEPTED SOLUTION

avatar
Contributor

you can send the key from mapper : customer id

value from mapper : amount

since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.

j.setCombinerClass(reducerclass.class);

You can increase number of reducers by using:

j.setNumReduceTasks(3) // it creates 3 reducers.

you use both concepts combiners and partitioners in your program.

View solution in original post

2 REPLIES 2

avatar
Contributor

you can send the key from mapper : customer id

value from mapper : amount

since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.

j.setCombinerClass(reducerclass.class);

You can increase number of reducers by using:

j.setNumReduceTasks(3) // it creates 3 reducers.

you use both concepts combiners and partitioners in your program.

avatar
Rising Star

@Bala Vignesh N V

Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? -- Each reducer will operate on all of the map outputs