[TEZ-1152] Optimize broadcast join for scalability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- performance
- scalability

Description

Two main issues for large queries using broadcast shuffle

1. Lots of tasks communicate to same node for downloading shuffle data. So most of the time, single machine will be overloaded with requests.

2. Tasks pertaining to same job (in the same machine) downloads broadcast shuffle data redundantly. If the data can be copied to temp storage or ramfs, other tasks running in the same machine can refer to the local copy. Optimizing this would help when running multiple queries in parallel in the cluster.

Attachments

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Unassigned

Reporter:: Rajesh Balamohan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/May/14 04:33

Updated:: 21/Jul/15 20:57