Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

avatar

What is the purpose of the following two configuration parameters in mapred-size.xml? What are recommended values?

mapreduce.input.fileinputformat.split.minsize
mapreduce.input.fileinputformat.split.maxsize

Thanks 🙂

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

View solution in original post

3 REPLIES 3

avatar
Master Mentor
@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

avatar

Thanks @Neeraj

I also found these two books:

Pro Apache Hadoop

Hadoop Definitive Guide

And both are basically saying that mapreduce.input.fileinputformat.split.minsize < dfs.blocksize < ...maxsize

Smartsense recommended: 105MB (minsize) and 270MB (maxsize)

Our current block setting is 64MB, although Smartsense recommended 128MB blocksize, so it kind of fits the min/max recommendations as well as the descriptions from the books.

avatar
Master Mentor

Thanks for sharing 🙂 @Jonas Straub