Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have a lot of queries like the following to retrieve top N tuples per group:
SELECT x, y FROM (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) AS rn FROM t1) t2 WHERE rn <= 3;
The performance is not good if each group has a lot more tuples than wanted, because we will retrieve and sort all the tuples, instead of just doing a top-N heap sort.
In order to do optimization for this kind of query, we need to extend window to support limit, if and only if there is only 1 window function, and it is row_number(). We also need a substitute rule to push the limit into window. Of course, we also need to modify executor to support this optimization (can be later).
Filter (rn <= 3) +- Window (window#0={Partition by x order by y ROW_NUMBER()})
to
Filter (rn <= 3) +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
Thoughts? Objections?
Attachments
Issue Links
- is related to
-
CALCITE-1317 Rewrite "MAX" as "ORDER BY ... LIMIT ..."
- Open
-
CALCITE-4687 Add LIMIT to WITHIN GROUP clause of aggregate functions
- Open