<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Any calculation to use  number of mappers and containers? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137107#M27661</link>
    <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;1) Number of mappers depends on various factors. primarily number of splits - mapreduce.input.fileinputformat.split.minsize &amp;amp; mapreduce.input.fileinputformat.split.maxsize&lt;/P&gt;&lt;P&gt;So a 5GB file configured to have max split size and min split size of 1GB will have 5 mappers. This is just an illustration. &lt;/P&gt;&lt;P&gt;See this for Recommended values -&amp;gt; &lt;A href="https://community.hortonworks.com/questions/2179/recommended-config-mapreduceinputfileinputformatsp.html" target="_blank"&gt;https://community.hortonworks.com/questions/2179/recommended-config-mapreduceinputfileinputformatsp.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) Number of containers depends on container size. Read this for calculation of container size &lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/" target="_blank"&gt;http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3) Distcp - read this &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Sys_Admin_Guides/content/ref-7dbacce5-2629-4e31-b143-e20df092f6d5.1.html" target="_blank"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Sys_Admin_Guides/content/ref-7dbacce5-2629-4e31-b143-e20df092f6d5.1.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Pranay Vyas&lt;/P&gt;</description>
    <pubDate>Mon, 09 May 2016 00:57:28 GMT</pubDate>
    <dc:creator>PranayV</dc:creator>
    <dc:date>2016-05-09T00:57:28Z</dc:date>
    <item>
      <title>Any calculation to use  number of mappers and containers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137106#M27660</link>
      <description>&lt;P&gt;1. To process 5 GB file, how many mappers are required? Is there any calculation to use number of mappers, reducers and containers?   &lt;/P&gt;&lt;P&gt;2. How to improve the performance of distcp?&lt;/P&gt;</description>
      <pubDate>Sun, 08 May 2016 23:32:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137106#M27660</guid>
      <dc:creator>Neyyu</dc:creator>
      <dc:date>2016-05-08T23:32:48Z</dc:date>
    </item>
    <item>
      <title>Re: Any calculation to use  number of mappers and containers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137107#M27661</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;1) Number of mappers depends on various factors. primarily number of splits - mapreduce.input.fileinputformat.split.minsize &amp;amp; mapreduce.input.fileinputformat.split.maxsize&lt;/P&gt;&lt;P&gt;So a 5GB file configured to have max split size and min split size of 1GB will have 5 mappers. This is just an illustration. &lt;/P&gt;&lt;P&gt;See this for Recommended values -&amp;gt; &lt;A href="https://community.hortonworks.com/questions/2179/recommended-config-mapreduceinputfileinputformatsp.html" target="_blank"&gt;https://community.hortonworks.com/questions/2179/recommended-config-mapreduceinputfileinputformatsp.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2) Number of containers depends on container size. Read this for calculation of container size &lt;/P&gt;&lt;P&gt;&lt;A href="http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/" target="_blank"&gt;http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3) Distcp - read this &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Sys_Admin_Guides/content/ref-7dbacce5-2629-4e31-b143-e20df092f6d5.1.html" target="_blank"&gt;https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Sys_Admin_Guides/content/ref-7dbacce5-2629-4e31-b143-e20df092f6d5.1.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Pranay Vyas&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 00:57:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137107#M27661</guid>
      <dc:creator>PranayV</dc:creator>
      <dc:date>2016-05-09T00:57:28Z</dc:date>
    </item>
    <item>
      <title>Re: Any calculation to use  number of mappers and containers?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137108#M27662</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/2668/rvgn77.html"&gt;kavitha velaga &lt;/A&gt;&lt;/P&gt;&lt;P&gt;1. Number of mappers depends on InputSplit of the file and hadoop launches mappers as much as reqired. User do not have direct control to set number of mapper via property.&lt;/P&gt;&lt;P&gt;2. To control the number of mapper, user has to control the number of inputsplit which is not necessary until there is requirement of custom logic.&lt;/P&gt;&lt;P&gt;3. User can control the number of reducer for &lt;/P&gt;&lt;P&gt;a MR job by setting this property : job.setNumReduceTasks(numOfReducer);&lt;/P&gt;&lt;P&gt;numOfReducer can have value from 0 to any positive integer.&lt;/P&gt;&lt;P&gt;if you choose 0 then MR job will be mapper only job(no reducer means no aggregation)&lt;/P&gt;&lt;P&gt;There are some usecases where Reducer is not necessary so putting numOfReducer=0 will make MR job to finish quickly (as job avoid shuffle and sorting).&lt;/P&gt;&lt;P&gt;4. Container size depends on how much memory your program would require in general.&lt;/P&gt;&lt;P&gt;5. Distcp - This ticket &lt;A href="https://issues.apache.org/jira/browse/HDFS-7535" target="_blank"&gt;https://issues.apache.org/jira/browse/HDFS-7535&lt;/A&gt; has improved distcp performance. To make distcp run quicker we might disable post copy check like checksum but then we trade-off with reliability.&lt;/P&gt;&lt;P&gt;Hope this helps&lt;/P&gt;</description>
      <pubDate>Thu, 19 May 2016 16:55:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Any-calculation-to-use-number-of-mappers-and-containers/m-p/137108#M27662</guid>
      <dc:creator>pradeep_bhadani</dc:creator>
      <dc:date>2016-05-19T16:55:43Z</dc:date>
    </item>
  </channel>
</rss>

