Reply
New Contributor
Posts: 3
Registered: ‎07-11-2015

JOB Stuck in Accepted State

Hi,

I am facing a strange issue,I have a single node install of CDH 5.4.

I am trying to run spark jobs. I see that only first job runs , and any jobs submiited after the first job get stuck in ACCEPTED state.

 

What could be th issue? Any limits that I might have accidentally set?

 

Thanks,

 Baahu

Posts: 1,533
Kudos: 276
Solutions: 233
Registered: ‎07-31-2013

Re: JOB Stuck in Accepted State

Your NodeManager's offered memory resource may be too low for the amount of memory the applications/jobs are demanding. This is a common situation that leads to a job waiting in ACCEPTED state, awaiting more resources to run.

You can raise the CM -> YARN -> Configuration -> "Container Memory" field values to higher numbers to resolve this.

This problem is typically also only seen on small installations such as 1-3 nodes.
Backline Customer Operations Engineer
Contributor
Posts: 34
Registered: ‎07-27-2015

Re: JOB Stuck in Accepted State

[ Edited ]

Does FairScheduler take only memory into consideration when making a decision or does it also use vcores? If it can depend upon multiple reasons, then again this may be another CR wherein user can get to know the exact reason (possibly through an API call) as to why an app is in ACCPETED state (such as memory, cores, disk space, queue limits, etc.)

Posts: 1,533
Kudos: 276
Solutions: 233
Registered: ‎07-31-2013

Re: JOB Stuck in Accepted State

CPUs are considered equally, if the request seeks that.
Backline Customer Operations Engineer
Cloudera Employee
Posts: 241
Registered: ‎01-16-2014

Re: JOB Stuck in Accepted State

What the FS takes into account depends on the scheduling type that you have chosen: DRF, Fair or FIFO. Default is DRF which takes into account both memory and CPU.

 

An application that asks for more resources than the cluster can accommodate, i.e I request 100GB for a container and the maximum container can only be 64GB then it will be rejected. However if I ask for 32 GB and the maximum container is 64GB but there is no node that is large enough to handle the 32 GB then it will just sit there forever (YARN-56). If the maximum container size is 64GB but no node can accommodate that container it most likely will just sit there too.

I am not sure what would happen if I request a 32GB container for a queue which has only 16GB as the maximum resources if it will be rejected or just sit there forever. I have not tested that case.

So you might have a misconfiguration or just run into a bug.

 

 

BTW: whatever was mentioned for memory is true for vcores also.

 

Wilfred

New Contributor
Posts: 3
Registered: ‎04-28-2016

Re: JOB Stuck in Accepted State

I have a single node system just for doing minmial testing. I have this exact situation but my laptop only has 16 GB to provide.

 

How do i set/configure the container memory to enable a job to run (get past the accepted state) ? Do i raise the container maximum to 16 GB? or Do I raise it to a value it can never provide like 64 GB?

Announcements