[SPARK-22614] Expose range partitioning shuffle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: Shuffle, Spark Core, SQL
Labels:
None

Description

Right now, the Dataset API only offers two possibilities for explicitly repartitioning a dataset:

round robin partitioning, via def repartition(numPartitions: Int)
hash partitioning, via def repartition(numPartitions: Int, partitionExprs: Column*)

It would be useful to also expose range partitioning, which can, for example, improve compression when writing data out to disk, or potentially enable new use cases.

Attachments

Issue Links

is related to

SPARK-22624 Expose range partitioning shuffle introduced by SPARK-22614

Resolved

links to

[Github] Pull Request #19828 (adrian-ionescu)

[Github] Pull Request #20456 (xubo245)

Activity

People

Assignee:: Adrian Ionescu

Reporter:: Adrian Ionescu

Shepherd:: Herman van Hövell

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Nov/17 11:45

Updated:: 17/May/20 18:29

Resolved:: 30/Nov/17 23:42