SYMPTOMS: If MaxApplications=1000 set in configuration than jobs would start getting stuck in "accepted" state after 500+ submissions. The following error is seen in resource manager logs:
caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1452018403088_0506 to YARN :
org.apache.hadoop.security.AccessControlException: Queue root.hive1 already has 1000 applications, cannot accept submission of application: application_1452018403088_0506
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
... 18 more
ROOT CAUSE: This a known issue reported in internal jira BUG-50642. As of date this issue is unresolved. However in Hadoop 2.8.0 this behavior is not seen.
WORKAROUND: Please set yarn.scheduler.capacity.root.ordering-policy.fair.enable-size-based-weight=false