[HIVE-7384] Research into reduce-side join [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
None

Description

Hive's join operator is very sophisticated, especially for reduce-side join. While we expect that other types of join, such as map-side join and SMB map-side join, will work out of the box with our design, there may be some complication in reduce-side join, which extensively utilizes key tag and shuffle behavior. Our design principle prefers to making Hive implementation work out of box also, which might requires new functionality from Spark. The tasks is to research into this area, identifying requirements for Spark community and the work to be done on Hive to make reduce-side join work.

A design doc might be needed for this. For more information, please refer to the overall design doc on wiki.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

sales_items.txt
04/Aug/14 19:05
0.2 kB
Brock Noland
sales_products.txt
04/Aug/14 19:05
0.0 kB
Brock Noland
sales_stores.txt
04/Aug/14 19:05
0.0 kB
Brock Noland
Hive on Spark Reduce Side Join.docx
21/Aug/14 01:56
112 kB
Szehon Ho

Issue Links

contains

HIVE-7815 Reduce Side Join with single reducer [Spark Branch]

Resolved

HIVE-7856 Enable parallelism in Reduce Side Join [Spark Branch]

Resolved

depends upon

SPARK-2978 Provide an MR-style shuffle transformation

Resolved

Activity

People

Assignee:: Szehon Ho

Reporter:: Xuefu Zhang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Jul/14 23:12

Updated:: 29/May/15 02:29

Resolved:: 08/Oct/14 18:37