Support Questions

balavignesh_nag · ‎03-08-2017

I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data.

I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here.

Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?

balaram38489 · ‎03-08-2017

you can send the key from mapper : customer id

value from mapper : amount

since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.

j.setCombinerClass(reducerclass.class);

You can increase number of reducers by using:

j.setNumReduceTasks(3) // it creates 3 reducers.

you use both concepts combiners and partitioners in your program.

View solution in original post

balaram38489 · ‎03-08-2017

you can send the key from mapper : customer id

value from mapper : amount

since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.

j.setCombinerClass(reducerclass.class);

You can increase number of reducers by using:

j.setNumReduceTasks(3) // it creates 3 reducers.

you use both concepts combiners and partitioners in your program.

ssanthosh · ‎03-14-2017

@Bala Vignesh N V

Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? -- Each reducer will operate on all of the map outputs

Cloudera Community

Support Questions

Distribution of key,value in mappers and Reducers