<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Does the  TaskTracker spawns a new Mapper for each input split or for each key-value pair? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100922#M13666</link>
    <description>&lt;P&gt;As per the The Definitive Guide-&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Mapper&lt;/STRONG&gt; as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Map method&lt;/STRONG&gt; that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;With the above consideration the TaskTracker spawns a new Mapper for each input split.&lt;/P&gt;&lt;P&gt;But if you look at the Mapper class code-&lt;/P&gt;&lt;PRE&gt; public class MaxTemperatureMapper
     extends Mapper&amp;lt;LongWritable, Text, Text, IntWritable&amp;gt; {&lt;/PRE&gt;
&lt;P&gt;It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object.&lt;/P&gt;&lt;P&gt;For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper.&lt;/P&gt;&lt;P&gt;This is little confusing. Can any one highlight more on whats exactly happens in this case.&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
    <pubDate>Tue, 29 Dec 2015 03:58:26 GMT</pubDate>
    <dc:creator>GeeKay2015</dc:creator>
    <dc:date>2015-12-29T03:58:26Z</dc:date>
    <item>
      <title>Does the  TaskTracker spawns a new Mapper for each input split or for each key-value pair?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100922#M13666</link>
      <description>&lt;P&gt;As per the The Definitive Guide-&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Mapper&lt;/STRONG&gt; as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Map method&lt;/STRONG&gt; that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;With the above consideration the TaskTracker spawns a new Mapper for each input split.&lt;/P&gt;&lt;P&gt;But if you look at the Mapper class code-&lt;/P&gt;&lt;PRE&gt; public class MaxTemperatureMapper
     extends Mapper&amp;lt;LongWritable, Text, Text, IntWritable&amp;gt; {&lt;/PRE&gt;
&lt;P&gt;It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object.&lt;/P&gt;&lt;P&gt;For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper.&lt;/P&gt;&lt;P&gt;This is little confusing. Can any one highlight more on whats exactly happens in this case.&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Dec 2015 03:58:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100922#M13666</guid>
      <dc:creator>GeeKay2015</dc:creator>
      <dc:date>2015-12-29T03:58:26Z</dc:date>
    </item>
    <item>
      <title>Re: Does the  TaskTracker spawns a new Mapper for each input split or for each key-value pair?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100923#M13667</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/971/gkadam2011.html" nodeid="971"&gt;@Gangadhar Kadam&lt;/A&gt; For each input split or file block, one map task is initiated. It doesn't depend on number of records(K, V pairs) in that block or input split. So, if you have m blocks or input splits, at least m map tasks will be initiated. It can be more than m, if you have speculative execution turned on. &lt;/P&gt;&lt;P&gt;w.r.t. your example, if your file of size 64MB has 1000 records and occupies one block, then only one map task would triggered.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Dec 2015 05:10:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100923#M13667</guid>
      <dc:creator>pardeep_kumar</dc:creator>
      <dc:date>2015-12-29T05:10:41Z</dc:date>
    </item>
    <item>
      <title>Re: Does the  TaskTracker spawns a new Mapper for each input split or for each key-value pair?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100924#M13668</link>
      <description>&lt;P&gt;Thanks Pradeep!&lt;/P&gt;</description>
      <pubDate>Tue, 29 Dec 2015 09:18:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100924#M13668</guid>
      <dc:creator>GeeKay2015</dc:creator>
      <dc:date>2015-12-29T09:18:22Z</dc:date>
    </item>
    <item>
      <title>Re: Does the  TaskTracker spawns a new Mapper for each input split or for each key-value pair?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100925#M13669</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/971/gkadam2011.html" nodeid="971"&gt;@Gangadhar Kadam&lt;/A&gt; As a best practice, please accept the answer if you are satisfied with answer. Then, we can close this question.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Dec 2015 02:27:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Does-the-TaskTracker-spawns-a-new-Mapper-for-each-input/m-p/100925#M13669</guid>
      <dc:creator>pardeep_kumar</dc:creator>
      <dc:date>2015-12-31T02:27:38Z</dc:date>
    </item>
  </channel>
</rss>

