Member since
11-22-2015
40
Posts
18
Kudos Received
0
Solutions
02-06-2016
06:59 AM
1 Kudo
Let’s have an example. MR job is run on a cluster of 10 Datanodes. Image this job needs 10 mappers and 2 reducers. 1) Let’s say 2 map tasks are running concurrently on “5 DataNode”, so we get totally 10 mappers executed simultaneously. 2) The output from each map task (if Combiner is used then Combiner Result) is stored on local filesystem on each Datanode. 3) These intermediate data needs to be exchanged between all nodes (shuffle phase) and sorted and given to “2 reduce tasks”.
So we had 5 Datanodes running map tasks. Which node does partition happens & how many partitions will be created?
... View more
02-04-2016
07:44 PM
1 Kudo
Thanks Neeraj
... View more
02-04-2016
07:44 PM
1 Kudo
Thanks Arpit. The section 8.3.1 was a very good read.
... View more
02-04-2016
06:02 PM
1 Kudo
As per Hadoop Definitive Guide - 3rd edition, Chapter 6 - The Map Side says: "Before it writes to disk, the thread first divides the data into partitions corresponding to the reducers that they will ultimately be sent to. Within each partition, the back-ground thread performs an in-memory sort by key, and if there is a combiner function, it is run on the output of the sort." However Yahoo developers tutorial says Combiner runs prior to partitioner. Okay here is why am confused. Can you please look into it and let me know
... View more
02-04-2016
06:00 PM
1 Kudo
As per Hadoop Definitive Guide - 3rd edition, Chapter 6 - The Map Side says:
"Before it writes to disk, the thread first divides the data into partitions corresponding
to the reducers that they will ultimately be sent to. Within each partition, the back-ground thread performs an in-memory sort by key, and if there is a combiner function,
it is run on the output of the sort."
However Yahoo developers tutorial says Combiner runs prior to partitioner.
Okay here is why am confused. Can you please look into it and let me know
... View more
02-04-2016
05:03 PM
thanks Artem.
... View more
02-04-2016
05:02 PM
1 Kudo
Thank you very much. The pictures you posted solved my query. Visual representation makes a clear win in ease of understanding.
... View more
02-04-2016
07:35 AM
1 Kudo
What is the difference between Partitioner, Combiner, Shuffle and sort phase in Map Reduce. What is the order of execution of these phases.
My understanding of the process flow is as follows: 1) Each Map Task output is Partitioned and sorted in memory and Combiner functions runs on it. This output is written to local disk called as Intermediate Data. 2) All the intermediate data from all the DataNodes go through a phase called Shuffle and sort and which is taken care by Hadoop Framework. 3) Sorted output is given as input to Reducers. Please verify if the process flow is correct and provide your valuable inputs.
... View more
Labels:
- Labels:
-
Apache Hadoop
02-03-2016
09:15 PM
3 Kudos
In HDFS
write, consider am writing a file of 1GB with 64MB as block size, then 16 blocks are created.
I want to know whether, acknowledgement report is sent to Client for each block written or only when all blocks are written?
... View more
Labels:
- Labels:
-
Apache Hadoop
01-30-2016
01:45 PM
Ah that's a good catch. Thanks Joseph
... View more