Support Questions

Find answers, ask questions, and share your expertise

Can not start resource manager with Fair Scheduler

avatar
Explorer

Hi,

when I switched to Fair scheduler however I still can't start the Resource Manager.

I'm using CDP 7.4.4

Following steps listed here   

https://community.cloudera.com/t5/Support-Questions/Unable-to-start-Node-Manager/td-p/285976

I made the following changes via UI and verified the deployed files

In yarn_site.xml

gocham_0-1662288257126.png

 

I can see from the resource manager that fair scheduler is loaded

2022-09-04 10:39:43,377 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file file:/run/cloudera-scm-agent/process/1546336134-yarn-RESOURCEMANAGER/fair-scheduler.xml

 

[root@10-222-53-95 1546336089-yarn-RESOURCEMANAGER]# cat /run/cloudera-scm-agent/process/1546336134-yarn-RESOURCEMANAGER/fair-scheduler.xml
<?xml version="1.0"?>
<allocations>
<queue name="sample_queue">
<minResources>10000 mb,0vcores</minResources>
<maxResources>90000 mb,0vcores</maxResources>
<maxRunningApps>50</maxRunningApps>

<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="sample_sub_queue">
<aclSubmitApps>charlie</aclSubmitApps>
<minResources>5000 mb,0vcores</minResources>
</queue>
<queue name="sample_reversable_queue">
<resevravation></resevravation>
</queue>
</queue>

<queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
<queueMaxResourcesDefault>5000 mb,0vcores</queueMaxResourcesDefault>

<!-- Queue 'secondary_group_queue' is a parent queue and may have
user queues under it -->
<queue name="secondary_group_queue" type="parent">
<weight>3.0</weight>
<maxChildResources>4096 mb,4vcores</maxChildResources>

</queue>

<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>

<queuePlacementPolicy>
<rule name="specified" />
<rule name="primaryGroup" create="false" />
<rule name="nestedUserQueue">
<rule name="secondaryGroupExistingQueue" create="false" />
</rule>
<rule name="default" queue="sample_queue"/>
</queuePlacementPolicy>

 

2022-09-04 10:39:43,583 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down for session: 0x1000fdc05e46ceb
2022-09-04 10:39:43,583 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler not instance of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy.init(AutoCreatedQueueDeletionPolicy.java:69)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.updateSchedulingMonitors(SchedulingMonitorManager.java:93)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.initialize(SchedulingMonitorManager.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1517)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:853)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1271)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:328)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1558)
2022-09-04 10:39:43,584 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2022-09-04 10:39:43,584 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2022-09-04 10:39:43,584 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler not instance of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedQueueDeletionPolicy.init(AutoCreatedQueueDeletionPolicy.java:69)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.serviceInit(SchedulingMonitor.java:61)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.updateSchedulingMonitors(SchedulingMonitorManager.java:93)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitorManager.initialize(SchedulingMonitorManager.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1517)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:853)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1271)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:328)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1558)
2022-09-04 10:39:43,585 INFO org.apache.ranger.audit.provider.AuditProviderFactory: ==> JVMShutdownHook.run()

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi @gocham ,

 

In CDP 7.1.7 Capacity Scheduler is alone supported and Fair Scheduler is not supported, Capacity Scheduler is the default and only supported scheduler. You must transition from Fair Scheduler to Capacity Scheduler when upgrading your cluster to CDP Private Cloud Base.

 

This is the related Jira from Cloudera - CLR-106983

 

Note: If i answered your question please give a thumbs up and accept it as a solution.

 

Regards,

Chethan YM

View solution in original post

2 REPLIES 2

avatar
Master Collaborator

Hi @gocham ,

 

In CDP 7.1.7 Capacity Scheduler is alone supported and Fair Scheduler is not supported, Capacity Scheduler is the default and only supported scheduler. You must transition from Fair Scheduler to Capacity Scheduler when upgrading your cluster to CDP Private Cloud Base.

 

This is the related Jira from Cloudera - CLR-106983

 

Note: If i answered your question please give a thumbs up and accept it as a solution.

 

Regards,

Chethan YM

avatar
Explorer

thanks Chethan.

The solution here https://community.cloudera.com/t5/Support-Questions/Invalid-resource-request-requested-resource-type... is apparently no longer valid.

I switched back to the Capacity scheduler, increased yarn.nodemanager.resource.memory-mb and everything seems to be OK now