Created on 03-16-2016 02:29 PM - edited 09-16-2022 03:09 AM
One of the objective in HDPCD:Java exam is to sort "output" of MR job using http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorC... My understanding is grouping comparator is for grouping records from multiple partitions. How can this be used for sorting? Do you mean using setSortComparatorClass?
Thanks for your help!
Created 03-16-2016 02:42 PM
Sorting in MR applies to two areas:
The exam objective you listed above is referring to both. The first one is fairly straightforward - you implement the compareTo method in your key class. The secondary sort involves a bit more work. There is a nice blog here that has an example of how to implement a secondary sort:
Created 03-16-2016 02:42 PM
Sorting in MR applies to two areas:
The exam objective you listed above is referring to both. The first one is fairly straightforward - you implement the compareTo method in your key class. The secondary sort involves a bit more work. There is a nice blog here that has an example of how to implement a secondary sort:
Created 03-16-2016 03:38 PM
Thanks @Rich Raposa. It was actually little confusing to see ONLY setGroupingComparator mentioned in the objective, while secondary-sort involves writing comparator classes for sorting/grouping and using both setSortComparatorClass and setGroupingComparator methods.