Support Questions

patelharshali13 · ‎03-30-2019

Differentiate between Map Side join and Reduce side Join in Hadoop?

bansal_himani13 · ‎04-03-2019

Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets.

The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower.

Map Side Join:

· Sorted by the same key.
· Equal number of partition.
· All the records of the same key should be in same partition.

Reduce Side Join:

· Much flexible to implement.
· There has to be custom WritableComparable with necessary function over ridden.
· We need a custom partitioner.
· Custom group comparator is required.

Cloudera Community

Support Questions

Differentiate between Map Side join and Reduce side Join in Hadoop?

Map Join Memory Sizing For LLAP

Enabling and configuring the ViewHDFS client side ...

Joining Collections in SOLR (Part I)

Hive query failed with java.io.IOException: Cannot...

how to join three data flow files through JoinEnr...

Understanding Spark through Map Reduce

Hive increase map join local task memory

How to reduce Spark shuffling caused by join with ...

for HDPCD:Java exam task, to be clear does "Perfor...

Spark (PySpark) for ETL to join text files with My...