- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
In Mapreduce how to sort intermediate output based on values?
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎12-26-2018 11:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to sort intermediate output based on values In MapReduce?
Created ‎12-27-2018 06:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting.
For more Information you can reference below links:
https://www.oreilly.com/library/view/data-algorithms/9781491906170/ch01.html
Please accept the answer you found most useful.
Created ‎12-27-2018 12:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.
