Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 2.3.0
Description
Equivalence classes can be used to avoid unnecessary exchanges when partitioning/fragmenting a plan. For example in TPC-H Q-13 the current plan includes
F03:PLAN FRAGMENT [HASH(c_custkey)] DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=10, HASH(c_count)] 04:AGGREGATE | output: count(*) | group by: count(o_orderkey) | hosts=10 per-host-mem=891.04MB | tuple-ids=4 row-size=16B cardinality=53086384 | 09:AGGREGATE [FINALIZE] | output: count:merge(o_orderkey) | group by: c_custkey | hosts=10 per-host-mem=89.10MB | tuple-ids=2 row-size=16B cardinality=53086384 | 08:EXCHANGE [HASH(c_custkey)] hosts=10 per-host-mem=0B tuple-ids=2 row-size=16B cardinality=53086384 F02:PLAN FRAGMENT [HASH(o_custkey)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=08, HASH(c_custkey)] 03:AGGREGATE | output: count(o_orderkey) | group by: c_custkey | hosts=10 per-host-mem=891.04MB | tuple-ids=2 row-size=16B cardinality=53086384 | 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | hash predicates: o_custkey = c_custkey | hosts=10 per-host-mem=37.77MB | tuple-ids=1N,0 row-size=88B cardinality=405000000 | |--07:EXCHANGE [HASH(c_custkey)] | hosts=10 per-host-mem=0B | tuple-ids=0 row-size=8B cardinality=45000000 | 06:EXCHANGE [HASH(o_custkey)] hosts=10 per-host-mem=0B tuple-ids=1 row-size=80B cardinality=405000000
These two fragments have equivalent input partitioning since o_custkey = c_custkey. The fragment below shows what the might be done instead.
F02:PLAN FRAGMENT [HASH(c_custkey)] DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=10, HASH(c_count)] 04:AGGREGATE | output: count(*) | group by: count(o_orderkey) | hosts=10 per-host-mem=891.04MB | tuple-ids=4 row-size=16B cardinality=53086384 | 09:AGGREGATE [FINALIZE] | output: count(o_orderkey) | group by: c_custkey | hosts=10 per-host-mem=891.04MB | tuple-ids=2 row-size=16B cardinality=53086384 | 02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | hash predicates: o_custkey = c_custkey | hosts=10 per-host-mem=37.77MB | tuple-ids=1N,0 row-size=88B cardinality=405000000 | |--07:EXCHANGE [HASH(c_custkey)] | hosts=10 per-host-mem=0B | tuple-ids=0 row-size=8B cardinality=45000000 | 06:EXCHANGE [HASH(o_custkey)] hosts=10 per-host-mem=0B tuple-ids=1 row-size=80B cardinality=405000000