- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Distribution of key,value in mappers and Reducers
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎03-08-2017 08:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data.
I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here.
Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?
Created ‎03-08-2017 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can send the key from mapper : customer id
value from mapper : amount
since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.
j.setCombinerClass(reducerclass.class);
You can increase number of reducers by using:
j.setNumReduceTasks(3) // it creates 3 reducers.
you use both concepts combiners and partitioners in your program.
Created ‎03-08-2017 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can send the key from mapper : customer id
value from mapper : amount
since your data is large, you can set the combiner with the reducer class so that a part of summing the values will be performed on map side.
j.setCombinerClass(reducerclass.class);
You can increase number of reducers by using:
j.setNumReduceTasks(3) // it creates 3 reducers.
you use both concepts combiners and partitioners in your program.
Created ‎03-14-2017 11:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? -- Each reducer will operate on all of the map outputs
