[DRILL-3381] Add option to distribute partition keys in CTAS - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
None

Description

The current implementation does not redistribute, which would tend to result in a lot of extra files. Specifically, the number of files will be larger by a factor equal to the number of fragments in the final stage of the query. On even a moderately sized cluster, this number could easily be in the thousands, so a table with a 100 different partitions would end up with hundreds of thousands of files.

To allow a workaround for this situation, we should add an option to include an extra distribution, so that all the rows for any given partition are written from the same writer.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

DRILL-3381.patch
26/Jun/15 00:51
8 kB
Steven Phillips
DRILL-3381.patch
26/Jun/15 00:57
8 kB
Steven Phillips

Activity

People

Assignee:: Steven Phillips

Reporter:: Steven Phillips

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Jun/15 00:36

Updated:: 22/Dec/17 15:54

Resolved:: 22/Dec/17 15:54