[SPARK-30220] Support Filter expression uses IN/EXISTS predicate sub-queries - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

Spark SQL cannot supports a SQL with nested aggregate as below:

select sum(unique1) FILTER (WHERE
 unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;

And Spark will throw exception as follows:

org.apache.spark.sql.AnalysisException
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
: +- Project [unique1#x]
: +- Filter (unique1#x < 100)
: +- SubqueryAlias `onek`
: +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, stringu1#x, stringu2#x, string4#x] csv file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
+- SubqueryAlias `tenk1`
 +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, stringu1#x, stringu2#x, string4#x] csv file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data

But PostgreSQL supports this syntax.

select sum(unique1) FILTER (WHERE
 unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
 sum 
------
 4950
(1 row)

Attachments

Issue Links

links to

[Github] Pull Request #34402 (tanelk)

Activity

Ascending order - Click to sort in descending order

Hyukjin Kwon added a comment - 12/Dec/19 02:32 - edited

beliefer, check you check the comments of its parent JIRA? Should better check other DBMSes too.

Hyukjin Kwon added a comment - 12/Dec/19 02:32 - edited beliefer , check you check the comments of its parent JIRA? Should better check other DBMSes too.

Apache Spark added a comment - 27/Oct/21 15:04

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/34402

Apache Spark added a comment - 27/Oct/21 15:04 User 'tanelk' has created a pull request for this issue: https://github.com/apache/spark/pull/34402

Wenchen Fan added a comment - 21/Mar/22 11:22

Issue resolved by pull request 34402
https://github.com/apache/spark/pull/34402

Wenchen Fan added a comment - 21/Mar/22 11:22 Issue resolved by pull request 34402 https://github.com/apache/spark/pull/34402

People

Assignee:: Tanel Kiis

Reporter:: Jiaan Geng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Dec/19 10:11

Updated:: 12/Dec/22 18:10

Resolved:: 21/Mar/22 11:22