Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
DAGScheduler has getMissingParentStages() and stageDependsOn() methods, which are suspiciously similar to getParentStages(). All of these methods perform traversal of the RDD / Stage graph to inspect parent stages. We can remove both of these methods, though: the set of parent stages is known when a Stage instance is constructed and is already stored in Stage.parents, so we can just check for missing stages by looking for unavailable stages in Stage.parents. Similarly, we can determine whether one stage depends on another by searching Stage.parents rather than performing the entire graph traversal from scratch.
Attachments
Issue Links
- is duplicated by
-
SPARK-5374 abstract RDD's DAG graph iteration in DAGScheduler
- Closed
- relates to
-
SPARK-15927 Eliminate redundant code in DAGScheduler's getParentStages and getAncestorShuffleDependencies methods.
- Resolved
- links to