[HIVE-3276] optimize union sub-queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 0.10.0
Component/s: None
Labels:
None

Description

It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries.

For eg:

a query like:

insert overwrite table T1 partition P1
select * from
(
subq1
union all
subq2
) u;

today creates 3 map-reduce jobs, one for subq1, another for subq2 and
the final one for the union.

It might be a good idea to optimize this. Instead of creating the union
task, it might be simpler to create a move task (or something like a move
task), where the outputs of the two sub-queries will be moved to the final
directory. This can easily extend to more than 2 sub-queries in the union.

This is very useful if there is a select * followed by filesink after the
union. This can be independently useful, and also be used to optimize the
skewed joins –
https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.

If there is a select, filter between the union and the filesink, the select
and the filter can be moved before the union, and the follow-up job can
still be removed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hive.3276.10.patch
03/Oct/12 08:37
479 kB
Namit Jain
hive.3276.11.patch
10/Oct/12 17:57
480 kB
Namit Jain
hive.3276.12.patch
12/Oct/12 15:10
480 kB
Namit Jain
hive.3276.13.patch
17/Oct/12 16:35
481 kB
Namit Jain
hive.3276.14.patch
29/Oct/12 11:04
498 kB
Namit Jain
hive.3276.2.patch
13/Aug/12 09:16
286 kB
Namit Jain
hive.3276.3.patch
16/Aug/12 18:31
312 kB
Namit Jain
hive.3276.4.patch
31/Aug/12 11:14
307 kB
Namit Jain
hive.3276.5.patch
20/Sep/12 06:02
365 kB
Namit Jain
hive.3276.6.patch
20/Sep/12 11:37
452 kB
Namit Jain
hive.3276.7.patch
21/Sep/12 04:07
452 kB
Namit Jain
hive.3276.8.patch
25/Sep/12 07:06
456 kB
Namit Jain
hive.3276.9.patch
29/Sep/12 16:33
456 kB
Namit Jain
HIVE-3276.1.patch
02/Aug/12 18:22
21 kB
Nadeem Moidu

Issue Links

depends upon

HIVE-3341 Making hive tests run against different MR versions

Closed

HIVE-3544 union involving double column with a map join subquery will fail or give wrong results

Closed

is depended upon by

HIVE-3380 As a follow up for HIVE-3276, optimize union for dynamic partition queries

Closed

is related to

HIVE-8054 Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

Resolved

HIVE-3451 map-reduce jobs does not work for a partition containing sub-directories

Closed

HIVE-3643 Hive List Bucketing - set hive.mapred.supports.subdirectories

Closed

(1 is related to)

Activity

People

Assignee:: Namit Jain

Reporter:: Namit Jain

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 19/Jul/12 04:42

Updated:: 15/Sep/14 03:03

Resolved:: 30/Oct/12 23:40