Support Questions

Find answers, ask questions, and share your expertise

small input split size in hadoop2.2

avatar
New Contributor

Hi,
i am using hadoop 2.2, and don't know how to set max input split size
I would like to decrease this value, in order to create more mappers
I tried updating yarn-site.xml, and but it does not work

indeed, with hadoop 2.2 /yarn, none the following settings has an effet input split size

<property>
<name>mapreduce.input.

fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsiz e</name>
<value>16777216</value>
</property>

<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>

best

2 ACCEPTED SOLUTIONS

avatar
Mentor
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

View solution in original post

avatar
New Contributor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best

View solution in original post

2 REPLIES 2

avatar
Mentor
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

avatar
New Contributor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best