Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

small input split size in hadoop2.2

SOLVED Go to solution

small input split size in hadoop2.2

New Contributor

Hi,
i am using hadoop 2.2, and don't know how to set max input split size
I would like to decrease this value, in order to create more mappers
I tried updating yarn-site.xml, and but it does not work

indeed, with hadoop 2.2 /yarn, none the following settings has an effet input split size

<property>
<name>mapreduce.input.

fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsiz e</name>
<value>16777216</value>
</property>

<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>

best

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: small input split size in hadoop2.2

Master Guru
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

Re: small input split size in hadoop2.2

New Contributor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best

2 REPLIES 2

Re: small input split size in hadoop2.2

Master Guru
The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

Re: small input split size in hadoop2.2

New Contributor

Hi,

I downloaded hadoop src  2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

 

  public static void setMaxInputSplitSize(Job job,
                                          long size) {
    job.getConfiguration().setLong(SPLIT_MAXSIZE, size);
  }

 

 

best