[PIG-3847] Sort avoidance for group by and join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: tez
Labels:
None

Description

Group by and join only require that the records be grouped together by key. It is not necessary for the keys to be sorted. If we can have a Tez Input/Output implementation that does the grouping using hashmap (memory, spilling, etc have to be handled) it could really speed up group by and join. Combiners on both input and output side can also be fast if serialization/deserialization is not required and that can be used instead of POPartialAgg.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Rohini Palaniswamy

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Mar/14 23:04

Updated:: 09/Oct/14 20:28