Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

avatar
Expert Contributor

I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .

Role of InputSplit ,RecordReader for Map Phase

When Shuffle/Sort Phase run

Partition phase

How the data goes to reducer

1 ACCEPTED SOLUTION

avatar
Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

View solution in original post

2 REPLIES 2

avatar
Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

avatar
Super Collaborator

Hi Amith,

Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.

https://www.youtube.com/watch?v=6OemZEJdMp8

Thanks,

Mahesh