Reply
Explorer
Posts: 8
Registered: ‎01-20-2017

A basic question on Hadoop 1.0 Map reduce process.

Dear folks,

 

I'm aware that Hadoop 2.0, JobTracker Architecture is restructured from Map reduce. However your help to understand the concept is highly appreciated.

 

Process:
JobTracker receives job from Client, communicates with NameNode for required information and distribute split input to Task Tracker to get the Task done.

 

My Question:
1. When input Data (64 MB or 128 MB) is located in dataBlock, does Job Tracker further split into small input records before assign to Map/reduce function or it just forward the data block, assuming the entire data block required for processing?

 

2. does Job Tracker chooses TaskTracker where Input is located(data block) or it's completely random?

 

Thanks for your help in advance.

 

Srujan

Cloudera Employee
Posts: 39
Registered: ‎12-14-2016

Re: A basic question on Hadoop 1.0 Map reduce process.

Hi Srujan,

 

  1. By default, the input split size is the same as the block size. However, setting the max input split size to be smaller than the block size will force smaller input split sizes since the input split is calculated by max(minimumSize, min(maximumSize, blockSize)). Recall that input splits are just references to the location of the data.
  2. The JobTracker will assign TaskTracker nodes that are closest to the data.

Hope that answers your questions!

Announcements