Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Differentiate between Map Side join and Reduce side Join in Hadoop?

Highlighted

Differentiate between Map Side join and Reduce side Join in Hadoop?

New Contributor

Differentiate between Map Side join and Reduce side Join in Hadoop?

1 REPLY 1

Re: Differentiate between Map Side join and Reduce side Join in Hadoop?

New Contributor

Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets.

The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower.

Map Side Join:

· Sorted by the same key.
· Equal number of partition.
· All the records of the same key should be in same partition.

Reduce Side Join:

· Much flexible to implement.
· There has to be custom WritableComparable with necessary function over ridden.
· We need a custom partitioner.
· Custom group comparator is required.