Created 12-28-2015 07:58 PM
As per the The Definitive Guide-
With the above consideration the TaskTracker spawns a new Mapper for each input split.
But if you look at the Mapper class code-
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object.
For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper.
This is little confusing. Can any one highlight more on whats exactly happens in this case.
Thanks in advance.
Created 12-28-2015 09:10 PM
@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.
w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.
Created 12-28-2015 09:10 PM
@Gangadhar Kadam For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on.
w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.
Created 12-29-2015 01:18 AM
Thanks Pradeep!
Created 12-30-2015 06:27 PM
@Gangadhar Kadam As a best practice, please accept the answer if you are satisfied with answer. Then, we can close this question.