Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The basic idea is to apply join predicates early in a plan in order to reduce the size of intermediate query results and, thus, reduce the cost of other operations. In other words, the idea is to apply the same join predicates twice or more often in a query plan
In order to reduce the communication costs of a distributed system. Obviously, semi-join reducers are only effective if the (redundant) semi-joins are cheap and result in a significant reduction of the size of intermediate
query results.
I propose to extend a query optimizer and integrate semi-join reducer and
join-ordering, etc. into a single query optimization step
Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi join reduction optimization in Calcite.
Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.
Query Total time CPU Intermediate rows (Million) Baseline 1,377 356,900 23,940 Semi Join Reduction 343 47,253 23
Q64 subset
select count(*) FROM store_sales JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk JOIN (select cs_item_sk from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk group by cs_item_sk having sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE i_color in ('maroon' , 'burnished', 'dim', 'steel', 'navajo', 'chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15
Attachments
Attachments
Issue Links
- is related to
-
CALCITE-1317 Rewrite "MAX" as "ORDER BY ... LIMIT ..."
- Open
-
CALCITE-481 Add "Spool" operator, to allow re-use of relational expressions
- Closed