Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
2.3.0
-
None
-
None
Description
From a discussion on the dev list, there is consensus around adding interfaces to pass required sorting and clustering to Spark. The proposal is to add:
interface RequiresClustering { Set<Expression> requiredClustering(); } interface RequiresSort { List<SortOrder> requiredOrdering(); }
When only RequiresSort is present, the sort would produce a global sort. The partitioning introduced by that sort would be overridden by RequiresClustering, making the sort local to each partition.
Attachments
Issue Links
- relates to
-
SPARK-33779 DataSource V2: API to request distribution and ordering on write
- Resolved
-
SPARK-33808 DataSource V2: Build logical writes in the optimizer
- Resolved
-
SPARK-34026 DataSource V2: Inject repartition and sort nodes to satisfy required distribution and ordering
- Resolved
-
SPARK-34049 DataSource V2: Use Write abstraction in StreamExecution
- Resolved
- links to