Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

small input split size in hadoop2.2

avatar
Frequent Visitor

Hi,
i am using hadoop 2.2, and don't know how to set max input split size
I would like to decrease this value, in order to create more mappers
I tried updating yarn-site.xml, and but it does not work

indeed, with hadoop 2.2 /yarn, none the following settings has an effet input split size

<property>
<name>mapreduce.input.

fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsiz e</name>
<value>16777216</value>
</property>

<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>

best

2 ACCEPTED SOLUTIONS

avatar
Mentor
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

View solution in original post

avatar
Frequent Visitor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best

View solution in original post

2 REPLIES 2

avatar
Mentor
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

avatar
Frequent Visitor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best