Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.9.0, 3.0.0-alpha1
-
None
-
Reviewed
Description
When RM failsover, it does not auto re-register running apps and so they need to re-register when reconnecting to new primary. This is done by catching ApplicationMasterNotRegisteredException in allocate calls and re-registering. But RequestHedgingRMFailoverProxyProvider does not propagate YarnException as the actual invocation is done asynchronously using seperate threads, so AMs cannot reconnect to RM after failover.
This JIRA proposes that the RequestHedgingRMFailoverProxyProvider propagate any YarnException that it encounters.
Attachments
Attachments
Issue Links
- is related to
-
YARN-4496 Improve HA ResourceManager Failover detection on the client
- Resolved