Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN ResourceManager Active/Standby Behavior

avatar
New Contributor

I need help understanding the YARN ResourceManager Active/Standby behavior.

Setup Context:

  • Two nodes are configured to host the YARN ResourceManager service.
  • One node acts as "Active" while the other is "Standby".
  • High Availability is turned off. If the "Active" service is terminated, a fail-over should not occur.

In an attempt to better understand how the YARN ResourceManager functions, I stopped the "Active" service to study the log output. A fail-over did not occur, as expected in my statement above. When starting the "Active" service back up, I found the service relabeled as "Standby". Well over 5 minutes have passed and the service remains labeled as "Standby".

At this point, there are two "Standby" services. Only the original "Active" service shows log activity, as expected. The log activity displays a number of Metrics errors:

  • No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
  • Unable to connect to collector, http://null:6188/ws/v1/timeline/metricsThis exceptions will be ignored for next 100 times

Would anyone happen to be familiar with such a situation? I greatly appreciate your time and input.

1 ACCEPTED SOLUTION

avatar

Hi @Anthony Seluk,

When you disable High Availability, automatic failover also gets disabled.

Hence when active RM is killed, it doesn't do automatic failover to standby RM [that is, making standby RM as active]. So there will not be any active RM; instead we have two standbyRMs.

In such scenario, we need to do manual failover. More info on manual failover is given below

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

View solution in original post

1 REPLY 1

avatar

Hi @Anthony Seluk,

When you disable High Availability, automatic failover also gets disabled.

Hence when active RM is killed, it doesn't do automatic failover to standby RM [that is, making standby RM as active]. So there will not be any active RM; instead we have two standbyRMs.

In such scenario, we need to do manual failover. More info on manual failover is given below

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html