Created 03-08-2017 08:50 AM
I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data.
I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here.
Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?
Created 03-08-2017 10:08 AM
you can send the key from mapper : customer id
value from mapper : amount
since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.
j.setCombinerClass(reducerclass.class);
You can increase number of reducers by using:
j.setNumReduceTasks(3) // it creates 3 reducers.
you use both concepts combiners and partitioners in your program.
Created 03-08-2017 10:08 AM
you can send the key from mapper : customer id
value from mapper : amount
since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.
j.setCombinerClass(reducerclass.class);
You can increase number of reducers by using:
j.setNumReduceTasks(3) // it creates 3 reducers.
you use both concepts combiners and partitioners in your program.
Created 03-14-2017 11:46 AM
Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? -- Each reducer will operate on all of the map outputs