Posts: 20
Registered: ‎03-15-2016

Yarn HA does not work (both Resource manager stays in standby state)



We have a pretty old CDH 5.7 cluster that works fine. But when we try to add a second Resource manager and enable high availability, both RM's remain in standby state and there is no active one.

This seems to be a known issue and the suggested fix is to run "yarn resourcemanager -format-state-store". Cloudera itself recoomends it here (search for "standby") and so does other articles on the web. However, running this and restarting the RM's did not solve our problem.

I also couldn't find anything special in the logs, and to make things even more strange, we have another 5.7 cluster where we successfuly enabled YARN high availability without issues.


Does anyone have an idea what's wrong ? Did anyone have such issue ?





Posts: 1,650
Kudos: 319
Solutions: 257
Registered: ‎07-31-2013

Re: Yarn HA does not work (both Resource manager stays in standby state)

The quoted documentation also indicates that the specific issue that required that format was resolved in CDH 5.2.1 onwards, so you shouldn't necessarily be running that as a fix for your problem.

The RMs in HA mode run an election after they are both up, with logs from classes org.apache.hadoop.ha.ActiveStandbyElector, org.apache.hadoop.yarn.server.resourcemanager.ResourceManager and org.apache.zookeeper.ZooKeeper helping detail its process.

I'd advise checking the logs for these classes and try to spot what the failure is. It may be ZK related or some other configuration. Alternatively share the RM logs via pastebin/etc..