Support Questions

amit_dass · ‎11-11-2016

I am looking for the easy explanation on Map Reduce phase - From InputSplit to Reducer .

Role of InputSplit ,RecordReader for Map Phase

When Shuffle/Sort Phase run

Partition phase

How the data goes to reducer

gkeys · ‎11-11-2016

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

View solution in original post

gkeys · ‎11-11-2016

Map-reduce

This is a good high-level (easy) explanation: http://www.thegeekstuff.com/2014/05/map-reduce-algorithm/

To really understand it, you need to dive deep. For example, mapper stage writes to local disk through a buffer which then spills to disk; this intermediate data is sent across the network to reducer(s). To really understand map-reduce (so you can optimize performance) reading this book is a good way to go: http://shop.oreilly.com/product/0636920033448.do

You can write your own map-reduce programs but they are typically implemented when you run a hive or pig job.

Tez

If you are running hive or pig queries, you should run it in tez mode. Tez is an alternative processing engine to map-reduce which is much faster.

See:

http://hortonworks.com/apache/tez/

http://www.slideshare.net/Hadoop_Summit/w-1205phall1saha

maheshmsh88 · ‎11-14-2016

Hi Amith,

Please refer this video he has explained very well MapReduce flow chart. Hope this will be useful.

https://www.youtube.com/watch?v=6OemZEJdMp8

Thanks,

Mahesh

Cloudera Community

Support Questions

Easy explaination on Map Reduce phase - From InputSplit to Reducer .

Understanding Spark through Map Reduce

Reducing Cloud Spend: Cost Strategies for Cloudera...

Hive on Tez Performance Tuning - Determining Reduc...

Map and Reduce Error: Java heap space

Hive query with group by clause stuck in reducer p...

Map Reduce job on YARN hangs in ACCEPTED state

Hive - tez , vertex failed error during reduce ph...

Re writing Avro map reduce to Parquet map reduce

Map reduce Flow clarification

Error: Java heap space in reducer phase