Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Resource Manager ends up in infinite loop making container reservations

Resource Manager ends up in infinite loop making container reservations

New Contributor

I'm running CDH 5.11.1 with the FairScheduler and preemption enabled.

 

When the cluster is fully utilized (no free vcores) the resource manager will often end up an infinite loop that looks like this.

 

2017-06-21 14:09:04,752 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1498049738482_0045_01_80455566 on node host: <HOSTNAME>:33692 #containers=40 available=96018 used=142208 for application application_1498049738482_0045                                                            2017-06-21 14:09:04,752 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=<HOSTNAME> app_id=application_1498049738482_0045

 

And by infinite loop, I mean it tries to reserve a container on the same node for the same application 30 times PER MILLISECOND until I restart the RM.  This looks a whole lot like YARN-4477 (https://issues.apache.org/jira/browse/YARN-4477), but that fix was backported to CDH 5.11.1

 

Anyone else seen this?  Thanks!

7 REPLIES 7

Re: Resource Manager ends up in infinite loop making container reservations

Explorer
We just got hit with this this morning, I'm raising a ticket with Cloudera.

Re: Resource Manager ends up in infinite loop making container reservations

Explorer
Out of curiosity, did you ever see negative numbers for Reserved Memory? At one point I saw Reserved Memory as "-1.20 PB" which is pretty aggressive.

Re: Resource Manager ends up in infinite loop making container reservations

New Contributor

Never saw negative memory, but that particular cluster was vcore rather than memory constrained.  We ended up switching to Capacity Scheduler pretty easily with good results, fwiw.

Re: Resource Manager ends up in infinite loop making container reservations

Explorer

See attachment - we don't have 1.4 billion cores but it sure did try to reserve that many.


billions-of-cores.png

Re: Resource Manager ends up in infinite loop making container reservations

Explorer
Followup: We're being advised to roll back to 5.11.0

Re: Resource Manager ends up in infinite loop making container reservations

New Contributor

We also encountered this error after upgrading to CDH 5.11.1 and had to roll-back consequently.

 

Does this bug mean that CDH 5.11.1 is an inherently broken release?

A statement from Cloudera would be great.

Re: Resource Manager ends up in infinite loop making container reservations

New Contributor