Description
The current rule can only push down the join condition from one side to table scan. If the join condition comes from both sides, it cannot push down the filter, but extracts the relevant columns to Join through scan for calculation.
SQL Query:
SELECT A.name AS cName, B.name AS fName FROM vehicles AS A JOIN dimTable as B ON ST_Contains(B.fence, ST_MakePoint(A.x, A.y));
Query Plan:
== Abstract Syntax Tree == LogicalProject(cName=[$1], fName=[$5]) +- LogicalJoin(condition=[ST_Contains($6, ST_MakePoint($2, $3))], joinType=[inner]) :- LogicalTableScan(table=[[default_catalog, default_database, vehicles]]) +- LogicalTableScan(table=[[default_catalog, default_database, dimTable]])
The same query in postgres behaves as follows (w/ & w/o index):
Nested Loop (cost=0.00..18071570.38 rows=722 width=64)
Join Filter: st_contains(b.fence, a.location)
-> Seq Scan on vehicles a (cost=0.00..18.50 rows=850 width=64)
-> Materialize (cost=0.00..22.75 rows=850 width=64)
-> Seq Scan on fences b (cost=0.00..18.50 rows=850 width=64)
# Using GIST Index
Nested Loop (cost=0.13..84.50 rows=1 width=64)
-> Seq Scan on fences b (cost=0.00..1.03 rows=3 width=64)
-> Index Scan using g_idx on vehicles a (cost=0.13..27.81 rows=1 width=64)
Index Cond: (location @ b.fence)
Filter: st_contains(b.fence, location)
We created an in-memory based index in TableScan, is there a way to convert the spatial join condition to NestedLoopJoin (LogicalCorrelate) and push down to the TableScan node?
If Calcite does not support such a design, can I ask what are the considerations/concerns?