Welcome to the Cloudera Community

adossett · ‎06-21-2017

I'm running CDH 5.11.1 with the FairScheduler and preemption enabled.

When the cluster is fully utilized (no free vcores) the resource manager will often end up an infinite loop that looks like this.

2017-06-21 14:09:04,752 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: Updated reserved container container_1498049738482_0045_01_80455566 on node host: <HOSTNAME>:33692 #containers=40 available=96018 used=142208 for application application_1498049738482_0045 2017-06-21 14:09:04,752 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: Making reservation: node=<HOSTNAME> app_id=application_1498049738482_0045

And by infinite loop, I mean it tries to reserve a container on the same node for the same application 30 times PER MILLISECOND until I restart the RM. This looks a whole lot like YARN-4477 (https://issues.apache.org/jira/browse/YARN-4477), but that fix was backported to CDH 5.11.1

Anyone else seen this? Thanks!

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

Resource Manager ends up in infinite loop making container reservations