Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

resource manager ha Vs yarn working perserving restarts

avatar
New Member

When resource manager HA is deployed so that active RM stores state information in zookeeper base path . My question is when RM HA is enabled for resource manager does the working-preserving for yarn should be enabled along with it ?

I thought if RM HA is enabled yarn.resourcemanager. ha. automatic-failover. enabled = true then the yarn.resourcemanager. workingpreserving-recovery. enabled = false. At anytime only one option of the above written should be true.

You are giving Zkaddress, store class , and store parent path in yarn working preserving recovery too. please give me an idea?

1 ACCEPTED SOLUTION

avatar
Master Guru

@sirisha A

Work-preserving ResourceManager restart ensures that applications continuously function during a ResourceManager restart with minimal impact to end-users.

The overall concept is that the ResourceManager preserves application queue state in a pluggable state store, and reloads that state on restart. While the ResourceManager is down, ApplicationMasters and NodeManagers continuously poll the ResourceManager until it restarts.

If you have automatic failover enabled true then this polling time will get reduced and your jobs will resume in short amount of time so I would suggest to have both the options true in the configuration.

Hope this information helps.

View solution in original post

1 REPLY 1

avatar
Master Guru

@sirisha A

Work-preserving ResourceManager restart ensures that applications continuously function during a ResourceManager restart with minimal impact to end-users.

The overall concept is that the ResourceManager preserves application queue state in a pluggable state store, and reloads that state on restart. While the ResourceManager is down, ApplicationMasters and NodeManagers continuously poll the ResourceManager until it restarts.

If you have automatic failover enabled true then this polling time will get reduced and your jobs will resume in short amount of time so I would suggest to have both the options true in the configuration.

Hope this information helps.