Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yarn HA does not work (both Resource manager stays in standby state)

avatar
Explorer

Hello

 

We have a pretty old CDH 5.7 cluster that works fine. But when we try to add a second Resource manager and enable high availability, both RM's remain in standby state and there is no active one.

This seems to be a known issue and the suggested fix is to run "yarn resourcemanager -format-state-store". Cloudera itself recoomends it here (search for "standby") and so does other articles on the web. However, running this and restarting the RM's did not solve our problem.

I also couldn't find anything special in the logs, and to make things even more strange, we have another 5.7 cluster where we successfuly enabled YARN high availability without issues.

 

Does anyone have an idea what's wrong ? Did anyone have such issue ?

 

Thanks

 

Guy

4 REPLIES 4

avatar
Mentor
The quoted documentation also indicates that the specific issue that required that format was resolved in CDH 5.2.1 onwards, so you shouldn't necessarily be running that as a fix for your problem.

The RMs in HA mode run an election after they are both up, with logs from classes org.apache.hadoop.ha.ActiveStandbyElector, org.apache.hadoop.yarn.server.resourcemanager.ResourceManager and org.apache.zookeeper.ZooKeeper helping detail its process.

I'd advise checking the logs for these classes and try to spot what the failure is. It may be ZK related or some other configuration. Alternatively share the RM logs via pastebin/etc..

avatar
Master Collaborator

Hi @ni4ni

I think the sulotion is to format the RMStateStore :

yarn resourcemanager -format-state-store

source: https://stackoverflow.com/questions/39369149/resource-manager-does-not-transit-to-active-state-from-...

avatar

While doing the manual failover in resource manager my schduled and running application id's will it move to stanby resource manager buy how? In hdfs name node journal nodes are monitoring the edit logs. In resource manager which daemon is monitoring?

avatar
New Contributor

Yarn resourcemanager keeps writing status of each running/finished application in the statestore. Statestore usually are managed in either zookeeper or in localFS based on our configurations. 

 

When the RM turns from standby to active it looks for the latest commits made by the other RM and loads them. If this information is lost at any given point, RM will fail to load the application information.