Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Does the TaskTracker spawns a new Mapper for each input split or for each key-value pair?

avatar
Expert Contributor

As per the The Definitive Guide-

  • Mapper as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file.
  • Map method that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split

With the above consideration the TaskTracker spawns a new Mapper for each input split.

But if you look at the Mapper class code-

 public class MaxTemperatureMapper
     extends Mapper<LongWritable, Text, Text, IntWritable> {

It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object.

For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper.

This is little confusing. Can any one highlight more on whats exactly happens in this case.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar

@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.

w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.

View solution in original post

3 REPLIES 3

avatar

@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.

w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.

avatar
Expert Contributor

Thanks Pradeep!

avatar

@Gangadhar Kadam As a best practice, please accept the answer if you are satisfied with answer. Then, we can close this question.