Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34581

BoundAttribute issue after optimization by BooleanSimplification and PushFoldableIntoBranches

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.2, 3.1.1, 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      BoundAttribute issue will occur after optimization by BooleanSimplification and PushFoldableIntoBranches. How to reproduce this issue:

      spark.sql("CREATE TABLE t1 (a INT, b INT) USING parquet")
      spark.sql("CREATE TABLE t2 (a INT, b INT) USING parquet")
      
            spark.sql(
              """
                |SELECT cnt,
                |       NOT ( buyer_id ) AS buyer_id2
                |FROM   (SELECT t1.a IS NOT NULL AS buyer_id,
                |               Count(*)         AS cnt
                |        FROM   t1
                |               INNER JOIN t2
                |                       ON t1.a = t2.a
                |        GROUP  BY 1) t 
                |""".stripMargin).collect()
      
      Couldn't find a#4 in [CASE WHEN isnotnull(a#4) THEN 1 ELSE 2 END#10,count(1)#3L]
      java.lang.IllegalStateException: Couldn't find a#4 in [CASE WHEN isnotnull(a#4) THEN 1 ELSE 2 END#10,count(1)#3L]
      	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
      	at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:316)
      	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:316)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:321)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
      	at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
      

      Another case:

      spark.sql(
        """
          |SELECT cnt,
          |       CASE WHEN ( buyer_id = 2 AND cnt > 3 ) THEN 2 ELSE 3 END AS buyer_id2
          |FROM   (SELECT CASE WHEN t1.a IS NOT NULL THEN 1 ELSE 2 END AS buyer_id, Count(*) AS cnt
          |        FROM   t1 INNER JOIN t2 ON t1.a = t2.a
          |        GROUP  BY 1) t
          |""".stripMargin).collect()
      

      Attachments

        Activity

          People

            petertoth Peter Toth
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: