Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22614

Expose range partitioning shuffle

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Shuffle, Spark Core, SQL
    • None

    Description

      Right now, the Dataset API only offers two possibilities for explicitly repartitioning a dataset:

      • round robin partitioning, via def repartition(numPartitions: Int)
      • hash partitioning, via def repartition(numPartitions: Int, partitionExprs: Column*)

      It would be useful to also expose range partitioning, which can, for example, improve compression when writing data out to disk, or potentially enable new use cases.

      Attachments

        Issue Links

          Activity

            People

              a.ionescu Adrian Ionescu
              a.ionescu Adrian Ionescu
              Herman van Hövell Herman van Hövell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: