question Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer . in Archives of Support Questions (Read Only)

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

amit_dass — Fri, 11 Nov 2016 21:50:36 GMT

I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .

Role of InputSplit ,RecordReader for Map Phase

When Shuffle/Sort Phase run

Partition phase

How the data goes to reducer

Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer .

gkeys — Sat, 12 Nov 2016 00:09:18 GMT

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer .

maheshmsh88 — Mon, 14 Nov 2016 23:53:00 GMT

Hi Amith,

Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.

https://www.youtube.com/watch?v=6OemZEJdMp8

Thanks,

Mahesh