Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This JIRA will track the implementation of improvements to the handling of intermediate data (e.g., map output). Specifically, it tracks changes in support of preempting running tasks, checkpointing completed work, and spawning one or more tasks to complete the original split/partition. These mechanisms allow one to manage skew in intermediate data, respond to resource abundance or scarcity (particularly with preemption), speculatively execute on the remaining work from checkpointed tasks, and automatically tune parameters for performance.
Iterations will build on learnings from previous work, including the following:
Technical reports:
http://research.yahoo.com/files/yl-2012-002.pdf
http://research.yahoo.com/files/yl-2012-003.pdf
Source code:
http://code.google.com/p/sailfish
Attachments
Issue Links
- contains
-
YARN-567 RM changes to support preemption for FairScheduler and CapacityScheduler
- Closed
-
YARN-568 FairScheduler: support for work-preserving preemption
- Closed
-
YARN-569 CapacityScheduler: support for preemption (using a capacity monitor)
- Closed
- incorporates
-
YARN-650 User guide for preemption
- Resolved
-
MAPREDUCE-5176 Preemptable annotations (to support preemption in MR)
- Closed
-
YARN-45 [Preemption] Scheduler feedback to AM to release containers
- Closed
-
MAPREDUCE-4585 Checkpoint shuffle aggregation as map output
- Open
-
MAPREDUCE-4586 Reduce large output segments directly from remote host
- Open
-
MAPREDUCE-4587 Support fetch by key boundaries for memcmp types
- Open
-
MAPREDUCE-4588 Map local segments as on-disk segments
- Open
-
MAPREDUCE-4589 MapTask preemption
- Open
-
MAPREDUCE-4590 ReduceTask preemption
- Open
-
MAPREDUCE-4591 Extend IFile format to include optional metadata
- Open
-
MAPREDUCE-4592 Collect statistics on key distributions/samples in intermediate data
- Open
-
MAPREDUCE-5196 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
- Resolved
-
MAPREDUCE-5197 Checkpoint Service: a library component to facilitate checkpoint of task state
- Resolved
-
MAPREDUCE-5192 Separate TCE resolution from fetch
- Closed
-
MAPREDUCE-5194 Heed interrupts during Fetcher shutdown
- Closed
- is related to
-
YARN-291 [Umbrella] Dynamic resource configuration
- Open