Our cluster is currently struggling with the workload placed upon it. We are facing tens of thousands of jobs launched in a small time period. We've tuned YARN heavily and are now facing Oozie problems.
The question this raises is, what is an appropriate number of jobs to run on a cluster? Is there a best practice based on YARN capacity or the number of datanodes? I've done a lot of research and haven't found anything significant.