Support Questions
Find answers, ask questions, and share your expertise

what's the recommended value of mapreduce.job.max.split.locations ?

Solved Go to solution
Highlighted

what's the recommended value of mapreduce.job.max.split.locations ?

Guru

and what's the exact purpose of this? I understood the default value (10) is very low and that we should put that to cluster nodes number, what would be the impact of setting it to, say 1000 or even 100000?

thanks experts !

1 ACCEPTED SOLUTION

Accepted Solutions

Re: what's the recommended value of mapreduce.job.max.split.locations ?

Explorer

This configuration is involved since MR v1. It serves as an up limit for DN locations of job split which intend to protect the JobTracker from overloaded by jobs with huge numbers of split locations. For YARN in Hadoop 2, this concern is lessened as we have per job AM instead of JT. However, it will still impact RM as RM will potentially see heavy request from the AM which tries to obtain many localities for the split. With hitting this limit, it will truncate location number to given limit with sacrifice a bit data locality but get rid of the risk to hit bottleneck of RM.

Depends on your job's priority (I believer it is a per job configuration now), you can leave it as a default (for lower or normal priority job) or increase to a larger number. Increase this value to larger than DN number will be the same impact as set it to DN's number.

View solution in original post

2 REPLIES 2

Re: what's the recommended value of mapreduce.job.max.split.locations ?

Explorer

This configuration is involved since MR v1. It serves as an up limit for DN locations of job split which intend to protect the JobTracker from overloaded by jobs with huge numbers of split locations. For YARN in Hadoop 2, this concern is lessened as we have per job AM instead of JT. However, it will still impact RM as RM will potentially see heavy request from the AM which tries to obtain many localities for the split. With hitting this limit, it will truncate location number to given limit with sacrifice a bit data locality but get rid of the risk to hit bottleneck of RM.

Depends on your job's priority (I believer it is a per job configuration now), you can leave it as a default (for lower or normal priority job) or increase to a larger number. Increase this value to larger than DN number will be the same impact as set it to DN's number.

View solution in original post

Highlighted

Re: what's the recommended value of mapreduce.job.max.split.locations ?

Guru

thanks @Junping Du

Don't have an account?