Created 09-04-2022 03:45 AM
Hi,
when I switched to Fair scheduler however I still can't start the Resource Manager.
I'm using CDP 7.4.4
Following steps listed here
https://community.cloudera.com/t5/Support-Questions/Unable-to-start-Node-Manager/td-p/285976
I made the following changes via UI and verified the deployed files
In yarn_site.xml
I can see from the resource manager that fair scheduler is loaded
2022-09-04 10:39:43,377 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file file:/run/cloudera-scm-agent/process/1546336134-yarn-RESOURCEMANAGER/fair-scheduler.xml
[root@10-222-53-95 1546336089-yarn-RESOURCEMANAGER]# cat /run/cloudera-scm-agent/process/1546336134-yarn-RESOURCEMANAGER/fair-scheduler.xml
<?xml version="1.0"?>
<allocations>
<queue name="sample_queue">
<minResources>10000 mb,0vcores</minResources>
<maxResources>90000 mb,0vcores</maxResources>
<maxRunningApps>50</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="sample_sub_queue">
<aclSubmitApps>charlie</aclSubmitApps>
<minResources>5000 mb,0vcores</minResources>
</queue>
<queue name="sample_reversable_queue">
<resevravation></resevravation>
</queue>
</queue>
<queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
<queueMaxResourcesDefault>5000 mb,0vcores</queueMaxResourcesDefault>
<!-- Queue 'secondary_group_queue' is a parent queue and may have
user queues under it -->
<queue name="secondary_group_queue" type="parent">
<weight>3.0</weight>
<maxChildResources>4096 mb,4vcores</maxChildResources>
</queue>
<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
<queuePlacementPolicy>
<rule name="specified" />
<rule name="primaryGroup" create="false" />
<rule name="nestedUserQueue">
<rule name="secondaryGroupExistingQueue" create="false" />
</rule>
<rule name="default" queue="sample_queue"/>
</queuePlacementPolicy>
2022-09-04 10:39:43,583 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x1000fdc05e46ceb
2022-09-04 10:39:43,583 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler not instance of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy.init(AutoCreatedQueueDeletionPolicy.java:69)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.updateSchedulingMonitors(SchedulingMonitorManager.java:93)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.initialize(SchedulingMonitorManager.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1517)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:853)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1271)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:328)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1558)
2022-09-04 10:39:43,584 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2022-09-04 10:39:43,584 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2022-09-04 10:39:43,584 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler not instance of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy.init(AutoCreatedQueueDeletionPolicy.java:69)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.updateSchedulingMonitors(SchedulingMonitorManager.java:93)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.initialize(SchedulingMonitorManager.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1517)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:853)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1271)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:328)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1558)
2022-09-04 10:39:43,585 INFO org.apache.ranger.audit.provider.AuditProviderFactory: ==> JVMShutdownHook.run()
Created 09-07-2022 11:08 PM
Hi @gocham ,
In CDP 7.1.7 Capacity Scheduler is alone supported and Fair Scheduler is not supported, Capacity Scheduler is the default and only supported scheduler. You must transition from Fair Scheduler to Capacity Scheduler when upgrading your cluster to CDP Private Cloud Base.
This is the related Jira from Cloudera - CLR-106983
Note: If i answered your question please give a thumbs up and accept it as a solution.
Regards,
Chethan YM
Created 09-07-2022 11:08 PM
Hi @gocham ,
In CDP 7.1.7 Capacity Scheduler is alone supported and Fair Scheduler is not supported, Capacity Scheduler is the default and only supported scheduler. You must transition from Fair Scheduler to Capacity Scheduler when upgrading your cluster to CDP Private Cloud Base.
This is the related Jira from Cloudera - CLR-106983
Note: If i answered your question please give a thumbs up and accept it as a solution.
Regards,
Chethan YM
Created 09-08-2022 04:56 AM
thanks Chethan.
The solution here https://community.cloudera.com/t5/Support-Questions/Invalid-resource-request-requested-resource-type... is apparently no longer valid.
I switched back to the Capacity scheduler, increased yarn.nodemanager.resource.memory-mb and everything seems to be OK now