[SPARK-36117] Join can become unresolved after PullupCorrelatedPredicates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Join can become unresolved after PullupCorrelatedPredicates:

create view t1(c1, c2) as values (0, 1), (1, 2)
create view t2(c1, c2) as values (0, 2), (0, 3)

select (
  select sum(l.cnt + r.cnt)
  from (select count(*) cnt from t2 where t1.c1 = t2.c1 having cnt = 0) l
  join (select count(*) cnt from t2 where t1.c1 = t2.c1 having cnt = 0) r
  on l.cnt = r.cnt
) from t1

== Optimized Logical Plan ==
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '(' expecting {<EOF>, '.', '-'}(line 1, pos 14)

== SQL ==
scalarsubquery(c1, c1)
--------------^^^

This is because duplicate attributes are not handled correctly when pulling up correlated predicates over joins. Both `pullOutCorrelatedPredicates` and `DecorrelateInnerQuery` are subject to this issue.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Allison Wang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Jul/21 04:45

Updated:: 13/Jul/21 04:45