[SPARK-29699] Different answers in nested aggregates with window functions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

A nested aggregate below with a window function seems to have different answers in the `rsum` column between PgSQL and Spark;

postgres=# create table gstest2 (a integer, b integer, c integer, d integer, e integer, f integer, g integer, h integer);
postgres=# insert into gstest2 values
postgres-#   (1, 1, 1, 1, 1, 1, 1, 1),
postgres-#   (1, 1, 1, 1, 1, 1, 1, 2),
postgres-#   (1, 1, 1, 1, 1, 1, 2, 2),
postgres-#   (1, 1, 1, 1, 1, 2, 2, 2),
postgres-#   (1, 1, 1, 1, 2, 2, 2, 2),
postgres-#   (1, 1, 1, 2, 2, 2, 2, 2),
postgres-#   (1, 1, 2, 2, 2, 2, 2, 2),
postgres-#   (1, 2, 2, 2, 2, 2, 2, 2),
postgres-#   (2, 2, 2, 2, 2, 2, 2, 2);
INSERT 0 9
postgres=# 
postgres=# select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
postgres-#   from gstest2 group by rollup (a,b) order by rsum, a, b;
 a | b | sum | rsum 
---+---+-----+------
 1 | 1 |   8 |    8
 1 | 2 |   2 |   10
 1 |   |  10 |   20
 2 | 2 |   2 |   22
 2 |   |   2 |   24
   |   |  12 |   36
(6 rows)

scala> sql("""
     | select a, b, sum(c), sum(sum(c)) over (order by a,b) as rsum
     |   from gstest2 group by rollup (a,b) order by rsum, a, b
     | """).show()
+----+----+------+----+                                                         
|   a|   b|sum(c)|rsum|
+----+----+------+----+
|null|null|    12|  12|
|   1|null|    10|  22|
|   1|   1|     8|  30|
|   1|   2|     2|  32|
|   2|null|     2|  34|
|   2|   2|     2|  36|
+----+----+------+----+

Attachments

Issue Links

is related to

SPARK-29708 Different answers in aggregates of duplicate grouping sets

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Takeshi Yamamuro

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 01/Nov/19 04:17

Updated:: 12/Dec/22 18:10