- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Differentiate between Map Side join and Reduce side Join in Hadoop?
- Labels:
-
Apache Hadoop
-
Apache Hive
Created 03-30-2019 11:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Differentiate between Map Side join and Reduce side Join in Hadoop?
Created 04-03-2019 12:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets.
The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower.
Map Side Join:
· Sorted by the same key.
· Equal number of partition.
· All the records of the same key should be in same partition.
Reduce Side Join:
· Much flexible to implement.
· There has to be custom WritableComparable with necessary function over ridden.
· We need a custom partitioner.
· Custom group comparator is required.
