Support Questions
Find answers, ask questions, and share your expertise

yarn + resource manager failed to start on HDP clusters

we have 2 resource managers that are working as part of HDP cluster

the first resource manager is failed after couple minutes

from the log of the resource manager we can see the following lines that returned many times

 


2021-06-27 14:07:16,022 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(152)) - Assigned container container_e83_1624802728037_0001_01_004355 of capacity <memory:86016, vCores:5> on host datanode23.fgtf.com:45454, which has 5 containers, <memory:199680, vCores:15> used and <memory:27708, vCores:75> available after allocation

2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes:

 

 

AND

 

2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes:

and

2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.101.13:53092/api/v1/applications/application_1624802728037_0009/executors which is the app master GUI of application_1624802728037_0009 owned by hdfs
2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.14.5:36198/api/v1/applications/application_1624801538018_0011/executors which is the app master GUI of application_1624801538018_0011 owned by hdfs


any idea what is the meaning of the INFO about?

INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10

 

Michael-Bronson
8 REPLIES 8

Re: yarn + resource manager failed to start on HDP clusters

Mentor

@mike_bronson7 

Have you recently changed your YARN configs [CapacityScheduler and the FairScheduler]? That seem related to over subscription?
Can you share your queue or capacity-scheduler.xml or fair-scheduler.xml.

Can you check the Ambari Ui-->YARN--> Config--> Version to ensure there is wasn't a change.
Geoffrey

Re: yarn + resource manager failed to start on HDP clusters

here the details

 

 

capacity-scheduler=null
yarn.scheduler.capacity.default.minimum-user-limit-percent=100
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default

Michael-Bronson

Re: yarn + resource manager failed to start on HDP clusters

and

 

 

yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled --> false

Michael-Bronson

Re: yarn + resource manager failed to start on HDP clusters

based on the logs and based on what you see

what is the preferred selecting that we need to use

 

CapacityScheduler  OR  FairScheduler ?

Michael-Bronson

Re: yarn + resource manager failed to start on HDP clusters

Mentor

@mike_bronson7 

Can you share the below files  in /var/log/hadoop-yarn/yarn

  • hadoop-yarn-resourcemanager-{hostname}.log
  • hadoop-yarn-resourcemanager-{hostname}.out

Happy hadooping

Re: yarn + resource manager failed to start on HDP clusters

Mentor

@mike_bronson7 
Waiting for your response with the logs.

Re: yarn + resource manager failed to start on HDP clusters

the logs are include some sensitive data , so I cant to attached all the log content , but the lines that I posted are the lines that are popular in the logs , also I forget to mention another problem that we cant also to access port 8088  

Michael-Bronson

Re: yarn + resource manager failed to start on HDP clusters

Mentor

@mike_bronson7 

 

Are you using the default capacity schedule settings? No queues/leafs created?  Is what you shared the current seeting?