Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Solved Go to solution

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Expert Contributor

I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .

Role of InputSplit ,RecordReader for Map Phase

When Shuffle/Sort Phase run

Partition phase

How the data goes to reducer

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

2 REPLIES 2

Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Guru

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

Re: Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Expert Contributor

Hi Amith,

Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.

https://www.youtube.com/watch?v=6OemZEJdMp8

Thanks,

Mahesh

Don't have an account?
Coming from Hortonworks? Activate your account here