Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

A basic question on Hadoop 1.0 Map reduce process.

A basic question on Hadoop 1.0 Map reduce process.

Explorer

Dear folks,

 

I'm aware that Hadoop 2.0, JobTracker Architecture is restructured from Map reduce. However your help to understand the concept is highly appreciated.

 

Process:
JobTracker receives job from Client, communicates with NameNode for required information and distribute split input to Task Tracker to get the Task done.

 

My Question:
1. When input Data (64 MB or 128 MB) is located in dataBlock, does Job Tracker further split into small input records before assign to Map/reduce function or it just forward the data block, assuming the entire data block required for processing?

 

2. does Job Tracker chooses TaskTracker where Input is located(data block) or it's completely random?

 

Thanks for your help in advance.

 

Srujan

1 REPLY 1
Highlighted

Re: A basic question on Hadoop 1.0 Map reduce process.

Contributor

Hi Srujan,

 

  1. By default, the input split size is the same as the block size. However, setting the max input split size to be smaller than the block size will force smaller input split sizes since the input split is calculated by max(minimumSize, min(maximumSize, blockSize)). Recall that input splits are just references to the location of the data.
  2. The JobTracker will assign TaskTracker nodes that are closest to the data.

Hope that answers your questions!