Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Currently, we allow only single output distribution for aggregates, but looks like if we have hash input distribution and all grouping set contains all of the distribution keys we can make aggregation on remote nodes and produce hash output distribution with the same keys. This will reduce memory consumption on the initiator node and make some other optimizations possible.
For example, query:
SELECT t1.aff_key, t2.cnt FROM t1 JOIN (SELECT aff_key, COUNT(*) AS cnt FROM t2 GROUP BY id) AS t2 ON t1.aff_key = t2.aff_key
Can do colocated join if both tables are colocated on aff_key. Currently, such a query does join on the initiator node.
The same for set-ops (EXCEPT, INTERSECT).
Attachments
Issue Links
- is part of
-
IGNITE-12248 Apache Calcite based query execution engine
- Open
- links to