Differentiate between Map Side join and Reduce side Join in Hadoop?
Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets.
The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower.
Map Side Join:
· Sorted by the same key.
· Equal number of partition.
· All the records of the same key should be in same partition.
Reduce Side Join:
· Much flexible to implement.
· There has to be custom WritableComparable with necessary function over ridden.
· We need a custom partitioner.
· Custom group comparator is required.