Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Recommended value of mapreduce.client.submit.file.replication for large clusters

Highlighted

Recommended value of mapreduce.client.submit.file.replication for large clusters

Explorer

The documentation states that it should be around the square root of the number of nodes. What is the logic behind this formula? Anything to be cautious about setting it to ~30+on a 1000+ node cluster?

1 REPLY 1

Re: Recommended value of mapreduce.client.submit.file.replication for large clusters

Expert Contributor

@Sanjeev

This is to ensure that when the job is submitted to the cluster , in which case the job resources needed to run the job ( job jars files , config files and the computed input splits ) needs to be propagated to the cluster nodes so that there are lot of copies across the cluster for nodemanagers to access when they execute the tasks for the job .

This is just to ensure that we have redundancy for the job resources when the tasks are executed. It should be ok to set this to a high value in such a big cluster.

Don't have an account?
Coming from Hortonworks? Activate your account here