Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Mentor

@Ram D you have an oversubscribed queue, it says you have a total of 14GB of RAM on your cluster with 14GB used, so until the first job finishes, you won't have any progress. Easy test is to kill the first job and you can see the 2nd job go from accepted to running.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

Capacity scheduler must have default partition for AM container calculation. We added two more nodes as default partition, now it is accepting multiple jobs and working fine. AM has a bug in 2.7, without default partition we can't implement capacity-scheduler to make use of entire cluster. Without default partition, only node resources are available as entire cluster resources. So, it is unable to run the second job, remain in accepted state.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Mentor

@Ram D Are you talking about Hadoop 2.7.1? As 2.7.2 was just released just didn't trickle down to HDP yet. Can you post an article with your findings, it would be essential for HCC community.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

Here is the link from Apache JIRA regarding my issue.

https://issues.apache.org/jira/browse/YARN-3216

Highlighted

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Rising Star

@Ram D

I could see that you have mentioned the warning in the log was -

"not starting application as amIfStarted exceeds amLimit"

This indicates that the yarnClientApplication request to launch the ApplicationMaster container is not able to get the resource in the current queue, where the AM container is being tried to be launched. The Maximum Percentage of AM containers have reached threshold in the current queue. You can wait for the currently running AM container to complete or try to increase the threshold.

I would suggest to first take a quick look on the "maxAMResourcePerQueuePercent" for the queue where this ApplicationMaster is launched. This can be checked from the Capacity Scheduler configuration in Yarn. Try increasing the capacity of "maxAMResourcePerQueuePercent" to a higher value. This way you are increasing the threshold and are allowing more AM container to be run in the current queue.