The documentation states that it should be around the square root of the number of nodes. What is the logic behind this formula? Anything to be cautious about setting it to ~30+on a 1000+ node cluster?
This is to ensure that when the job is submitted to the cluster , in which case the job resources needed to run the job ( job jars files , config files and the computed input splits ) needs to be propagated to the cluster nodes so that there are lot of copies across the cluster for nodemanagers to access when they execute the tasks for the job .
This is just to ensure that we have redundancy for the job resources when the tasks are executed. It should be ok to set this to a high value in such a big cluster.