Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
While we've done some good work with better handling when Spark is choosing to decommission nodes (SPARK-7955), it might make sense in environments where we get preempted without our own choice (e.g. YARN over-commit, EC2 spot instances, GCE Preemptiable instances, etc.) to do something for the data on the node (or at least not schedule any new tasks).
Attachments
Issue Links
- is related to
-
SPARK-46957 Migrated shuffle data files from the decommissioned node should be removed when job completed
- Resolved
-
SPARK-7955 Dynamic allocation: longer timeout for executors with cached blocks
- Closed
-
SPARK-48636 Event driven block manager decommissioner
- Open
-
SPARK-3174 Provide elastic scaling within a Spark application
- Closed
-
SPARK-33005 Kubernetes GA Preparation
- Resolved
-
SPARK-41550 Dynamic Allocation on K8S GA
- Resolved
- links to
1.
|
Improve cache block migration | Open | Unassigned | |
2.
|
Add an option to reject block migrations when under disk pressure | Open | Unassigned | |
3.
|
Improve ExecutorDecommissionInfo and ExecutorDecommissionState for different use cases | In Progress | Unassigned | |
4.
|
Rename all decommission configurations to use the same namespace "spark.decommission.*" | In Progress | Unassigned | |
5.
|
Do not drop cached RDD blocks to accommodate blocks from decommissioned block manager if enough memory is not available | In Progress | Unassigned | |
6.
|
Decommission executors in batches to avoid overloading network by block migrations. | In Progress | Unassigned | |
7.
|
Put blocks only on disk while migrating RDD cached data | In Progress | Unassigned | |
8.
|
Decommission logs too frequent when waiting migration to finish | In Progress | Apache Spark | |
9.
|
Add support for YARN decommissioning when ESS is Enabled | In Progress | Unassigned |