Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35553 Improve correlated subqueries
  3. SPARK-36117

Join can become unresolved after PullupCorrelatedPredicates

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • SQL
    • None

    Description

      Join can become unresolved after PullupCorrelatedPredicates:

      create view t1(c1, c2) as values (0, 1), (1, 2)
      create view t2(c1, c2) as values (0, 2), (0, 3)
      
      select (
        select sum(l.cnt + r.cnt)
        from (select count(*) cnt from t2 where t1.c1 = t2.c1 having cnt = 0) l
        join (select count(*) cnt from t2 where t1.c1 = t2.c1 having cnt = 0) r
        on l.cnt = r.cnt
      ) from t1
      
      == Optimized Logical Plan ==
      org.apache.spark.sql.catalyst.parser.ParseException:
      mismatched input '(' expecting {<EOF>, '.', '-'}(line 1, pos 14)
      
      == SQL ==
      scalarsubquery(c1, c1)
      --------------^^^
      

      This is because duplicate attributes are not handled correctly when pulling up correlated predicates over joins. Both `pullOutCorrelatedPredicates` and `DecorrelateInnerQuery` are subject to this issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: