[SPARK-21097] Dynamic allocation will preserve cached data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0, 2.3.0
Fix Version/s: None
Component/s: Block Manager, Scheduler, Spark Core
Labels:
- bulk-closed

Description

We want to use dynamic allocation to distribute resources among many notebook users on our spark clusters. One difficulty is that if a user has cached data then we are either prevented from de-allocating any of their executors, or we are forced to drop their cached data, which can lead to a bad user experience.

We propose adding a feature to preserve cached data by copying it to other executors before de-allocation. This behavior would be enabled by a simple spark config. Now when an executor reaches its configured idle timeout, instead of just killing it on the spot, we will stop sending it new tasks, replicate all of its rdd blocks onto other executors, and then kill it. If there is an issue while we replicate the data, like an error, it takes too long, or there isn't enough space, then we will fall back to the original behavior and drop the data and kill the executor.

This feature should allow anyone with notebook users to use their cluster resources more efficiently. Also since it will be completely opt-in it will unlikely to cause problems for other use cases.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Preserving Cached Data with Dynamic Allocation.pdf
15/Jun/17 16:31
25 kB
Brad

Issue Links

Blocked

SPARK-36446 YARN shuffle server restart crashes all dynamic allocation jobs that have deallocated an executor

Open

is part of

SPARK-21084 Improvements to dynamic allocation for notebook use cases

Resolved

relates to

SPARK-25888 Service requests for persist() blocks via external service after dynamic deallocation

Resolved

links to

[Github] Pull Request #19041 (brad-kaiser)

GitHub Pull Request #19041

Activity

People

Assignee:: Unassigned

Reporter:: Brad

Votes:: 2 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 14/Jun/17 17:58

Updated:: 06/Aug/21 17:08

Resolved:: 25/May/21 01:38