Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

Solved Go to solution
Highlighted

Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

Rising Star

As per the The Definitive Guide-

  • Mapper as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file.
  • Map method that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split

With the above consideration the TaskTracker spawns a new Mapper for each input split.

But if you look at the Mapper class code-

 public class MaxTemperatureMapper
     extends Mapper<LongWritable, Text, Text, IntWritable> {

It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object.

For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper.

This is little confusing. Can any one highlight more on whats exactly happens in this case.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.

w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.

View solution in original post

3 REPLIES 3
Highlighted

Re: Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.

w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.

View solution in original post

Highlighted

Re: Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

Rising Star

Thanks Pradeep!

Highlighted

Re: Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

@Gangadhar Kadam As a best practice, please accept the answer if you are satisfied with answer. Then, we can close this question.

Don't have an account?
Coming from Hortonworks? Activate your account here