Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
This is already discussed on the umbrella JIRA YARN-1489.
Copying some of my condensed summary from the design doc (section 3.2.10.3) of YARN-4692.
Even after the existing work in Workpreserving AM restart (Section 3.1.2 / YARN-1489), we still haven’t solved the problem of old running containers not knowing where the new AM starts running after the previous AM crashes. This is a specifically important problem to be solved for long running services where we’d like to avoid killing service containers when AMs failover. So far, we left this as a task for the apps, but solving it in YARN is much desirable. [(Task) This looks very much like service-registry (YARN-913), but for appcontainers to discover their own AMs.
Combining this requirement (of any container being able to find their AM across failovers) with those of services (to be able to find through DNS where a service container is running - YARN-4757) will put our registry scalability needs to be much higher than that of just service endpoints. This calls for a more distributed solution for registry readers something that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
See comment https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359
Attachments
Attachments
Issue Links
- is part of
-
YARN-1489 [Umbrella] Work-preserving ApplicationMaster restart
- Resolved
-
YARN-4692 [Umbrella] Simplified and first-class support for services in YARN
- Reopened
- is related to
-
MAPREDUCE-6608 Work Preserving AM Restart for MapReduce
- Open
-
YARN-913 Umbrella: Add a way to register long-lived services in a YARN cluster
- Open
-
YARN-4757 [Umbrella] Simplified discovery of services via DNS mechanisms
- Resolved
- relates to
-
YARN-4602 Simple and Scalable Message Service for YARN application
- Open