Archives of Support Questions (Read Only)

rimm · ‎04-30-2014

Hi,
i am using hadoop 2.2, and don't know how to set max input split size
I would like to decrease this value, in order to create more mappers
I tried updating yarn-site.xml, and but it does not work

indeed, with hadoop 2.2 /yarn, none the following settings has an effet input split size

<property>
<name>mapreduce.input.

fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsiz e</name>
<value>16777216</value>
</property>

<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>

best

Harsh J · ‎05-03-2014

The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

View solution in original post

rimm · ‎05-07-2014

Hi,

I downloaded hadoop src 2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

public static void setMaxInputSplitSize(Job job,

long size) {

job.getConfiguration().setLong(SPLIT_MAXSIZE, size);

}

best

View solution in original post

Harsh J · ‎05-03-2014

The parameters mapreduce.input.fileinputformat.split.minsize and mapreduce.input.fileinputformat.split.maxsize should work per the CDH5 (assuming you're using CDH as this is a CDH users forum) as can be seen in the code: https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr... and https://github.com/cloudera/hadoop-common/blob/cdh5.0.0-release/hadoop-mapreduce-project/hadoop-mapr...

CDH5 is Apache Hadoop 2.3.0 with backports from Apache Hadoop trunk.

rimm · ‎05-07-2014

Hi,

I downloaded hadoop src 2.3 (from http://mir2.ovh.net/ftp.apache.org/dist/hadoop/common/hadoop-2.3.0/), compile it to run under 64bits,

and used the following method for setting the input split size

public static void setMaxInputSplitSize(Job job,

long size) {

job.getConfiguration().setLong(SPLIT_MAXSIZE, size);

}

best

Cloudera Community

Archives of Support Questions (Read Only)

small input split size in hadoop2.2