<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: We can set the number of reduce tasks for the MapReduce jobs generated by Pig by&amp;quot;set default parallel&amp;quot;  or PARALLEL clause, but how set no of map tasks? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124733#M22694</link>
    <description>&lt;P&gt;thanks &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 14 Mar 2016 17:26:29 GMT</pubDate>
    <dc:creator>mhdeshmukh22</dc:creator>
    <dc:date>2016-03-14T17:26:29Z</dc:date>
    <item>
      <title>We can set the number of reduce tasks for the MapReduce jobs generated by Pig by"set default parallel"  or PARALLEL clause, but how set no of map tasks?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124730#M22691</link>
      <description />
      <pubDate>Sun, 13 Mar 2016 23:23:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124730#M22691</guid>
      <dc:creator>mhdeshmukh22</dc:creator>
      <dc:date>2016-03-13T23:23:22Z</dc:date>
    </item>
    <item>
      <title>Re: We can set the number of reduce tasks for the MapReduce jobs generated by Pig by"set default parallel"  or PARALLEL clause, but how set no of map tasks?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124731#M22692</link>
      <description>&lt;P&gt;You can't set number of mappers, it is determined by number of blocks in your dataset. &lt;/P&gt;&lt;P&gt;The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.&lt;/P&gt;&lt;P&gt;The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 maps for very cpu-light map tasks. Task setup takes a while, so it is best if the maps take at least a minute to execute.&lt;/P&gt;&lt;P&gt;Thus, if you expect 10TB of input data and have a blocksize of 128MB, you’ll end up with 82,000 maps, unless Configuration.set(MRJobConfig.NUM_MAPS, int) (which only provides a hint to the framework) is used to set it even higher.&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Mapper" target="_blank"&gt;https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Mapper&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Mar 2016 23:29:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124731#M22692</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-13T23:29:38Z</dc:date>
    </item>
    <item>
      <title>Re: We can set the number of reduce tasks for the MapReduce jobs generated by Pig by"set default parallel"  or PARALLEL clause, but how set no of map tasks?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124732#M22693</link>
      <description>&lt;DIV&gt;There is actually a way to change the numbers of mappers in Pig. Pig uses a CombineFileInputFormat to merge small files into bigger map tasks. This is enabled by default and can be modified with the following parameters: For the rest what Artem said.&lt;/DIV&gt;&lt;UL&gt;
&lt;LI&gt;pig.maxCombinedSplitSize – Specifies the size, in bytes, of data to be processed by a single map. Smaller files are combined untill this size is reached. &lt;/LI&gt;&lt;LI&gt;pig.splitCombination – Turns combine split files on or off (set to “true” by default).&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Mon, 14 Mar 2016 04:50:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124732#M22693</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-03-14T04:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: We can set the number of reduce tasks for the MapReduce jobs generated by Pig by"set default parallel"  or PARALLEL clause, but how set no of map tasks?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124733#M22694</link>
      <description>&lt;P&gt;thanks &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 17:26:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124733#M22694</guid>
      <dc:creator>mhdeshmukh22</dc:creator>
      <dc:date>2016-03-14T17:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: We can set the number of reduce tasks for the MapReduce jobs generated by Pig by"set default parallel"  or PARALLEL clause, but how set no of map tasks?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124734#M22695</link>
      <description>&lt;P&gt;thanks@Artem&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2016 17:26:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/We-can-set-the-number-of-reduce-tasks-for-the-MapReduce-jobs/m-p/124734#M22695</guid>
      <dc:creator>mhdeshmukh22</dc:creator>
      <dc:date>2016-03-14T17:26:59Z</dc:date>
    </item>
  </channel>
</rss>

