- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
yarn + resource manager failed to start on HDP clusters
- Labels:
-
Apache Ambari
Created ‎06-27-2021 11:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we have 2 resource managers that are working as part of HDP cluster
the first resource manager is failed after couple minutes
from the log of the resource manager we can see the following lines that returned many times
2021-06-27 14:07:16,022 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(152)) - Assigned container container_e83_1624802728037_0001_01_004355 of capacity <memory:86016, vCores:5> on host datanode23.fgtf.com:45454, which has 5 containers, <memory:199680, vCores:15> used and <memory:27708, vCores:75> available after allocation
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes:
AND
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0004 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0003 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
2021-06-27 14:07:01,279 INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes:
and
2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.101.13:53092/api/v1/applications/application_1624802728037_0009/executors which is the app master GUI of application_1624802728037_0009 owned by hdfs
2021-06-27 14:07:01,282 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(382)) - dr.who is accessing unchecked http://22.20.14.5:36198/api/v1/applications/application_1624801538018_0011/executors which is the app master GUI of application_1624801538018_0011 owned by hdfs
any idea what is the meaning of the INFO about?
INFO fair.FSAppAttempt (FSAppAttempt.java:reservationExceedsThreshold(495)) - Reservation Exceeds Allowed number of nodes: app_id=application_1624802728037_0001 existingReservations=10 totalAvailableNodes=181 reservableNodesRatio=0.05 numAllowedReservations=10
Created ‎06-29-2021 05:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you recently changed your YARN configs [CapacityScheduler and the FairScheduler]? That seem related to over subscription?
Can you share your queue or capacity-scheduler.xml or fair-scheduler.xml.
Can you check the Ambari Ui-->YARN--> Config--> Version to ensure there is wasn't a change.
Geoffrey
Created ‎06-30-2021 04:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
here the details
capacity-scheduler=null
yarn.scheduler.capacity.default.minimum-user-limit-percent=100
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=100
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default
Created ‎06-30-2021 04:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and
yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.enabled --> false
Created ‎07-01-2021 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
based on the logs and based on what you see
what is the preferred selecting that we need to use
CapacityScheduler OR FairScheduler ?
Created on ‎07-01-2021 04:44 AM - edited ‎07-01-2021 04:45 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share the below files in /var/log/hadoop-yarn/yarn
- hadoop-yarn-resourcemanager-{hostname}.log
- hadoop-yarn-resourcemanager-{hostname}.out
Happy hadooping
Created ‎07-02-2021 03:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mike_bronson7
Waiting for your response with the logs.
Created ‎07-17-2021 11:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the logs are include some sensitive data , so I cant to attached all the log content , but the lines that I posted are the lines that are popular in the logs , also I forget to mention another problem that we cant also to access port 8088
Created ‎07-18-2021 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you using the default capacity schedule settings? No queues/leafs created? Is what you shared the current seeting?
