<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: small input split size in hadoop2.2 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/11828#M1678</link>
    <description>The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: &lt;A target="_blank" href="https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L61"&gt;https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L61&lt;/A&gt; and &lt;A target="_blank" href="https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L335"&gt;https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L335&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.</description>
    <pubDate>Sat, 03 May 2014 14:28:46 GMT</pubDate>
    <dc:creator>Harsh J</dc:creator>
    <dc:date>2014-05-03T14:28:46Z</dc:date>
    <item>
      <title>small input split size in hadoop2.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/11566#M1677</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;i am using hadoop 2.2, and don't know how to set max input split size&lt;BR /&gt;I would like to decrease this value, in order to create more mappers&lt;BR /&gt;I tried updating yarn-site.xml, and but it does not work&lt;BR /&gt;&lt;BR /&gt;indeed, with hadoop 2.2 /yarn, none the following settings has an effet input split size&lt;BR /&gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.input.&lt;BR /&gt;&lt;BR /&gt;fileinputformat.split.minsize&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapreduce.input.fileinputformat.split.maxsiz e&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;16777216&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.min.split.size&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;lt;property&amp;gt;&lt;BR /&gt;&amp;lt;name&amp;gt;mapred.max.split.size&amp;lt;/name&amp;gt;&lt;BR /&gt;&amp;lt;value&amp;gt;16777216&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;BR /&gt;best&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 08:58:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/11566#M1677</guid>
      <dc:creator>rimm</dc:creator>
      <dc:date>2022-09-16T08:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: small input split size in hadoop2.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/11828#M1678</link>
      <description>The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: &lt;A target="_blank" href="https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L61"&gt;https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L61&lt;/A&gt; and &lt;A target="_blank" href="https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L335"&gt;https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java#L335&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.</description>
      <pubDate>Sat, 03 May 2014 14:28:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/11828#M1678</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2014-05-03T14:28:46Z</dc:date>
    </item>
    <item>
      <title>Re: small input split size in hadoop2.2</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/12130#M1679</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I downloaded hadoop src&amp;nbsp; 2.3 (from &lt;A target="_blank" href="http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/),"&gt;http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/),&lt;/A&gt; compile it to run under 64bits,&lt;/P&gt;&lt;P&gt;and used the following method for setting the input split size&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="line"&gt;&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;public&lt;/SPAN&gt; &lt;SPAN&gt;static&lt;/SPAN&gt; &lt;SPAN&gt;void&lt;/SPAN&gt; &lt;SPAN&gt;setMaxInputSplitSize&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;Job&lt;/SPAN&gt; &lt;SPAN&gt;job&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="line"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;long&lt;/SPAN&gt; &lt;SPAN&gt;size&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt; &lt;SPAN&gt;{&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="line"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;job&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;getConfiguration&lt;/SPAN&gt;&lt;SPAN&gt;().&lt;/SPAN&gt;&lt;SPAN&gt;setLong&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;SPLIT_MAXSIZE&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt; &lt;SPAN&gt;size&lt;/SPAN&gt;&lt;SPAN&gt;);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class="line"&gt;&amp;nbsp;&amp;nbsp;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;best&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2014 13:21:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/small-input-split-size-in-hadoop2-2/m-p/12130#M1679</guid>
      <dc:creator>rimm</dc:creator>
      <dc:date>2014-05-08T13:21:20Z</dc:date>
    </item>
  </channel>
</rss>

