[HIVE-7527] Support order by and sort by on Spark [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
None

Description

Currently Hive depends completely on MapReduce's sorting as part of shuffling to achieve order by (global sort, one reducer) and sort by (local sort).
Spark has a sort by transformation in different variations that can used to support Hive's order by and sort by. However, we still need to evaluate weather Spark's sortBy can achieve the same functionality inherited from MapReduce's shuffle sort.

Currently Hive on Spark should be able to run simple sort by or order by, by changing the currently partitionBy to sortby. This is the way to verify theories. Complete solution will not be available until we have complete SparkPlanGenerator.

There is also a question of how we determine that there is order by or sort by by just looking at the operator tree, from which Spark task is created. This is the responsibility of SparkPlanGenerator, but we need to have an idea.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7527-spark.patch
04/Aug/14 12:35
11 kB
Rui Li
HIVE-7527.2-spark.patch
05/Aug/14 05:15
12 kB
Rui Li

Issue Links

relates to

HIVE-7772 Add tests for order/sort/distribute/cluster by query [Spark Branch]

Resolved

links to

Review Board

Activity

People

Assignee:: Rui Li

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/Jul/14 22:32

Updated:: 29/May/15 02:32

Resolved:: 05/Aug/14 05:39