Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Highlighted

When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

I configured node labels exclusive in manner and assigned to nodes to it each of 15GB memory and 4 core. So root will have four parent queues 25%. So each node label will have 26GB of memory and 8cores which are from assigned two nodes.

14 REPLIES 14

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

@Ram D

You may want to recheck the memory available because I have seen this during lack of resources available.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

1296-int-wv-queue-issue.jpg

To run the application, it needs 4 cores and 6GB of memory. If you see the attachment, there is 4 more cores available and 18GB of memory available. Its not changing to running state. After diiging through the logs, i saw one error "not starting application as amIfStarted exceeds amLimit". Which paramter i need to change?

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

@Ram D Thanks for sharing the screenshot. Whats HDP version?

You are using Hadoop 2.7.1 and this bug is fixed in Hadoop 2.8

https://issues.apache.org/jira/browse/YARN-3789

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

I configured int-wv queue to use minimum of 25% and max of 100% of the node label internal-wv. I am using same user to submit the application, does it matter? What are the configuration parameters to consume entire node label capacity? Which values i need to put for user limit factor,Maximum amresource,minimum user limit in internal-wv queue. Presently, this queue is using 23.6% of node label capacity. Still 4 cores and 18GB of memory available, but not changing to running state.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

@Ram D There is a chance that you are hitting https://issues.apache.org/jira/browse/YARN-3789

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Hello Ram

You could check a couple things. Generally speaking this could be an inability to access the right ressources. Aside from compute ressources or containers you also have AM containers constraints. Yarn queues have configuration for many different ressources and in particular for the 2 ones mentioned prior. I have come across situation where the Yarn AM maximum ressource configuration did not provide enough room for all the AM containers that were needed and getting stuck in this accepted but not running state. You check the value of that config as well.

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Expert Contributor

Which configuration change fixed your issue?

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

I added ressource to the Yarn Max AM config, upped it to 50%. I you see your query only grab a little bit of containers but not enough, then you could look at this config

Re: When i am submitting a first job its running perfectly good. When i submit the job in the same queue by the user, even the resources are available,job is not changing from ACCEPTED state to RUNNING. Reason?

Ram - I would say review the Yarn Capacity Schedule guide. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_yarn_resource_mgt/content/ch_capacity_sch....

This can be bit complicated as how you have assigned the resources. Also monitor on the Capacity Scheduler view. If you are stuck, you can always kill the blocking application if you do not need using "yarn application -kill <appid>" from the shell. Before that review the Capacity Scheduler. Apart from memory also ensure that you have enough cores available.

Don't have an account?
Coming from Hortonworks? Activate your account here