Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

avatar

What is the purpose of the following two configuration parameters in mapred-size.xml? What are recommended values?

mapreduce.input.fileinputformat.split.minsize
mapreduce.input.fileinputformat.split.maxsize

Thanks 🙂

1 ACCEPTED SOLUTION

avatar
Master Mentor
@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

View solution in original post

3 REPLIES 3

avatar
Master Mentor
@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

avatar

Thanks @Neeraj

I also found these two books:

Pro Apache Hadoop

Hadoop Definitive Guide

And both are basically saying that mapreduce.input.fileinputformat.split.minsize < dfs.blocksize < ...maxsize

Smartsense recommended: 105MB (minsize) and 270MB (maxsize)

Our current block setting is 64MB, although Smartsense recommended 128MB blocksize, so it kind of fits the min/max recommendations as well as the descriptions from the books.

avatar
Master Mentor

Thanks for sharing 🙂 @Jonas Straub