Support Questions

Find answers, ask questions, and share your expertise

enable Resource Manager HA in ambari leaves ambiguous configs for "old" resource manager

avatar

I've got a problem with my resourcemanager in yarn-site.cml config after enabling RM high availability.

First I had a resource manager at Server1, then when I installed Server2, and enabled RM HA, Server2 became the "active" resourcemanager.

My problem is that Ambari leaves the "old" configs of "yarn.resourcemanager.scheduler.address" set to "Server1" when adding the "yarn.resourcemanager.cluster-id" and other "yarn.resourcemanager.hostname.rm1" ..rm2 etc entries under the Custom yarn-site field in Ambari to contain both Server1 and Server2.

When I try to run my yarn apps they try to connect to Server1 port 8030, which it's not listening on, since it is "Standby Resourcemanager". I'd expect it to connect to the "yarn-cluster" name as defined in "yarn.resourcemanager.cluster-id", not specific hostname of the old Resource manager.

Is there something i'm overlooking in ambari configs, or is this a (known) bug?

1 ACCEPTED SOLUTION

avatar

I'm not sure if we've been doing it wrong, but after changing our yarn jobs the problem went away, could have been an issue at our end.

Although I still have this feeling in the back of my head that the "yarn.resourcemanager.scheduler.address" should be something else than a specific node, it should be the "yarn-cluster" address. But this might be an ambari-problem.

Thanks for your time!

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@Björn Zettergren make sure you restart all components with stale configs, you need to look for orange circle next to each component and restart it. Once everything is restarted, run the service check.

avatar

Thanks for a swift reply, however there are no pending restarts due to stale configs, and the service checks completes without any issues detected. None the less i restarted all yarn-services and re-ran the service-check, but no change in behaviour or config content.

I had similar issues with the namenode rpc-address after enabling NN-HA, that dissapeared after updating from HDP-2.3.2.0 to HDP-2.3.4.0, hence i suspected this to be a bug.

So, problem remains.

avatar
Master Mentor
@Björn Zettergren

Try this...

Failover the RM by restarting it and let Node1 be Active RM - run the job and check the behavior

Failover again and then Node 2 be Active RM and then run the job

avatar

I'm not sure if we've been doing it wrong, but after changing our yarn jobs the problem went away, could have been an issue at our end.

Although I still have this feeling in the back of my head that the "yarn.resourcemanager.scheduler.address" should be something else than a specific node, it should be the "yarn-cluster" address. But this might be an ambari-problem.

Thanks for your time!