Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

Solved Go to solution
Highlighted

Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

What is the purpose of the following two configuration parameters in mapred-size.xml? What are recommended values?

mapreduce.input.fileinputformat.split.minsize
mapreduce.input.fileinputformat.split.maxsize

Thanks :)

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

View solution in original post

3 REPLIES 3
Highlighted

Re: Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

@Jonas Straub

I found this really useful

Also, from Apache doc

Deprecated property name

mapred.min.split.size

New

mapreduce.input.fileinputformat.split.minsize

View solution in original post

Highlighted

Re: Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

Thanks @Neeraj

I also found these two books:

Pro Apache Hadoop

Hadoop Definitive Guide

And both are basically saying that mapreduce.input.fileinputformat.split.minsize < dfs.blocksize < ...maxsize

Smartsense recommended: 105MB (minsize) and 270MB (maxsize)

Our current block setting is 64MB, although Smartsense recommended 128MB blocksize, so it kind of fits the min/max recommendations as well as the descriptions from the books.

Highlighted

Re: Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]

Thanks for sharing :) @Jonas Straub

Don't have an account?
Coming from Hortonworks? Activate your account here