Support Questions

jstraub · ‎10-27-2015

What is the purpose of the following two configuration parameters in mapred-size.xml? What are recommended values?

mapreduce.input.fileinputformat.split.minsize
mapreduce.input.fileinputformat.split.maxsize

Thanks 🙂

nsabharwal · ‎10-27-2015

@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

View solution in original post

nsabharwal · ‎10-27-2015

@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

jstraub · ‎10-28-2015

Thanks @Neeraj

I also found these two books:

Pro Apache Hadoop

Hadoop Definitive Guide

And both are basically saying that mapreduce.input.fileinputformat.split.minsize < dfs.blocksize < ...maxsize

Smartsense recommended: 105MB (minsize) and 270MB (maxsize)

Our current block setting is 64MB, although Smartsense recommended 128MB blocksize, so it kind of fits the min/max recommendations as well as the descriptions from the books.

nsabharwal · ‎10-28-2015

Thanks for sharing 🙂 @Jonas Straub

Cloudera Community

Support Questions

Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]