Description
Spark SQL cannot supports a SQL with nested aggregate as below:
select sum(unique1) FILTER (WHERE unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
And Spark will throw exception as follows:
org.apache.spark.sql.AnalysisException
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
: +- Project [unique1#x]
: +- Filter (unique1#x < 100)
: +- SubqueryAlias `onek`
: +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, stringu1#x, stringu2#x, string4#x] csv file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
+- SubqueryAlias `tenk1`
+- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, stringu1#x, stringu2#x, string4#x] csv file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data
But PostgreSQL supports this syntax.
select sum(unique1) FILTER (WHERE unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1; sum ------ 4950 (1 row)
beliefer, check you check the comments of its parent JIRA? Should better check other DBMSes too.