[SPARK-20624] SPIP: Add better handling for node shutdown - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

While we've done some good work with better handling when Spark is choosing to decommission nodes (~~SPARK-7955~~), it might make sense in environments where we get preempted without our own choice (e.g. YARN over-commit, EC2 spot instances, GCE Preemptiable instances, etc.) to do something for the data on the node (or at least not schedule any new tasks).

Attachments

Issue Links

is related to

SPARK-46957 Migrated shuffle data files from the decommissioned node should be removed when job completed

Resolved

SPARK-7955 Dynamic allocation: longer timeout for executors with cached blocks

Closed

SPARK-48636 Event driven block manager decommissioner

Open

SPARK-3174 Provide elastic scaling within a Spark application

Closed

SPARK-33005 Kubernetes GA Preparation

Resolved

SPARK-41550 Dynamic Allocation on K8S GA

Resolved

links to

[Github] Pull Request #35094 (sungpeo)

(1 is related to, 2 links to)

Sub-Tasks

1.	Improve cache block migration	Open	Unassigned
2.	Add an option to reject block migrations when under disk pressure	Open	Unassigned
3.	Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases	In Progress	Unassigned
4.	Rename all decommission configurations to use the same namespace "spark.decommission.*"	In Progress	Unassigned
5.	Do not drop cached RDD blocks to accommodate blocks from decommissioned block manager if enough memory is not available	In Progress	Unassigned
6.	Decommission executors in batches to avoid overloading network by block migrations.	In Progress	Unassigned
7.	Put blocks only on disk while migrating RDD cached data	In Progress	Unassigned
8.	Decommission logs too frequent when waiting migration to finish	In Progress	Apache Spark
9.	Add support for YARN decommissioning when ESS is Enabled	In Progress	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Holden Karau

Votes:: 2 Vote for this issue

Watchers:: 38 Start watching this issue

Dates

Created:: 06/May/17 22:24

Updated:: 16/Jun/24 12:44