- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Recommended config: mapreduce.input.fileinputformat.split.[minsize|maxsize]
- Labels:
-
Apache Hadoop
Created ‎10-27-2015 08:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the purpose of the following two configuration parameters in mapred-size.xml? What are recommended values?
mapreduce.input.fileinputformat.split.minsize mapreduce.input.fileinputformat.split.maxsize
Thanks 🙂
Created ‎10-27-2015 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found this really useful
Also, from Apache doc
Deprecated property name
mapred.min.split.sizeNew
mapreduce.input.fileinputformat.split.minsizeCreated ‎10-27-2015 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found this really useful
Also, from Apache doc
Deprecated property name
mapred.min.split.sizeNew
mapreduce.input.fileinputformat.split.minsizeCreated ‎10-28-2015 11:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Neeraj
I also found these two books:
And both are basically saying that mapreduce.input.fileinputformat.split.minsize < dfs.blocksize < ...maxsize
Smartsense recommended: 105MB (minsize) and 270MB (maxsize)
Our current block setting is 64MB, although Smartsense recommended 128MB blocksize, so it kind of fits the min/max recommendations as well as the descriptions from the books.
Created ‎10-28-2015 11:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for sharing 🙂 @Jonas Straub
