[SPARK-31590] Metadata-only queries should not include subquery in partition filters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Trivial
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.6, 3.0.0
Component/s: SQL
Labels:
None

Description

When using ~~SPARK-23877~~, some sql execution errors.

code:

        sql("set spark.sql.optimizer.metadataOnly=true")
        sql("CREATE TABLE test_tbl (a INT,d STRING,h STRING) USING PARQUET PARTITIONED BY (d ,h)")
        sql("""
            |INSERT OVERWRITE TABLE test_tbl PARTITION(d,h)
            |SELECT 1,'2020-01-01','23'
            |UNION ALL
            |SELECT 2,'2020-01-02','01'
            |UNION ALL
            |SELECT 3,'2020-01-02','02'
            """.stripMargin)
        sql(
          s"""
             |SELECT d, MAX(h) AS h
             |FROM test_tbl
             |WHERE d= (
             |  SELECT MAX(d) AS d
             |  FROM test_tbl
             |)
             |GROUP BY d
        """.stripMargin).collect()

Exception:

java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#48 []

...
at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.prunePartitions(PartitioningAwareFileIndex.scala:180)

optimizedPlan:

Aggregate [d#245], [d#245, max(h#246) AS h#243]
+- Project [d#245, h#246]
   +- Filter (isnotnull(d#245) AND (d#245 = scalar-subquery#242 []))
      :  +- Aggregate [max(d#245) AS d#241]
      :     +- LocalRelation <empty>, [d#245]
      +- Relation[a#244,d#245,h#246] parquet

Attachments

Issue Links

links to

[Github] Pull Request #28383 (cxzl25)

Activity

People

Assignee:: dzcxzl

Reporter:: dzcxzl

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Apr/20 04:04

Updated:: 12/Dec/22 18:11

Resolved:: 06/May/20 01:57