Created 11-11-2016 01:50 PM
I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .
Role of InputSplit ,RecordReader for Map Phase
When Shuffle/Sort Phase run
Partition phase
How the data goes to reducer
Created 11-11-2016 04:09 PM
Map-reduce
This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/
To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do
You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.
Tez
If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.
See:
Created 11-11-2016 04:09 PM
Map-reduce
This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/
To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do
You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.
Tez
If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.
See:
Created 11-14-2016 03:53 PM
Hi Amith,
Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.
https://www.youtube.com/watch?v=6OemZEJdMp8
Thanks,
Mahesh