Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Spark SQL cannot supports a SQL with nested aggregate as below:
SELECT sum(salary), row_number() OVER (ORDER BY depname), sum( sum(salary) FILTER (WHERE enroll_date > '2007-01-01') ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS "filtered_sum", depname FROM empsalary GROUP BY depname;
And Spark will throw exception as follows:
org.apache.spark.sql.AnalysisException
It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.
But PostgreSQL supports this syntax.
SELECT sum(salary), row_number() OVER (ORDER BY depname), sum( sum(salary) FILTER (WHERE enroll_date > '2007-01-01') ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS "filtered_sum", depname FROM empsalary GROUP BY depname; sum | row_number | filtered_sum | depname -------+------------+--------------+----------- 25100 | 1 | 22600 | develop 7400 | 2 | 3500 | personnel 14600 | 3 | | sales (3 rows)