Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Expert Contributor

I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .

Role of InputSplit ,RecordReader for Map Phase

When Shuffle/Sort Phase run

Partition phase

How the data goes to reducer

1 ACCEPTED SOLUTION

Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

View solution in original post

2 REPLIES 2

Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

Expert Contributor

Hi Amith,

Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.

https://www.youtube.com/watch?v=6OemZEJdMp8

Thanks,

Mahesh