[SPARK-29708] Different answers in aggregates of duplicate grouping sets - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
Fix Version/s: 2.4.5, 3.0.0
Component/s: SQL
Labels:
- correctness

Description

A query below with multiple grouping sets seems to have different answers between PgSQL and Spark;

postgres=# create table gstest4(id integer, v integer, unhashable_col bit(4), unsortable_col xid);

postgres=# insert into gstest4
postgres-# values (1,1,b'0000','1'), (2,2,b'0001','1'),
postgres-#        (3,4,b'0010','2'), (4,8,b'0011','2'),
postgres-#        (5,16,b'0000','2'), (6,32,b'0001','2'),
postgres-#        (7,64,b'0010','1'), (8,128,b'0011','1');
INSERT 0 8

postgres=# select unsortable_col, count(*)
postgres-#   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
postgres-#   order by text(unsortable_col);
 unsortable_col | count 
----------------+-------
              1 |     8
              1 |     8
              2 |     8
              2 |     8
(4 rows)

scala> sql("""create table gstest4(id integer, v integer, unhashable_col /* bit(4) */ byte, unsortable_col /* xid */ integer) using parquet""")

scala> sql("""
     | insert into gstest4
     | values (1,1,tinyint('0'),1), (2,2,tinyint('1'),1),
     |        (3,4,tinyint('2'),2), (4,8,tinyint('3'),2),
     |        (5,16,tinyint('0'),2), (6,32,tinyint('1'),2),
     |        (7,64,tinyint('2'),1), (8,128,tinyint('3'),1)
     | """)
res21: org.apache.spark.sql.DataFrame = []

scala> 

scala> sql("""
     | select unsortable_col, count(*)
     |   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
     |   order by string(unsortable_col)
     | """).show
+--------------+--------+
|unsortable_col|count(1)|
+--------------+--------+
|             1|       8|
|             2|       8|
+--------------+--------+

Attachments

Issue Links

relates to

SPARK-29699 Different answers in nested aggregates with window functions

Open

SPARK-29701 Different answers when empty input given in GROUPING SETS

Closed

links to

GitHub Pull Request #26961

GitHub Pull Request #27229

Activity

People

Assignee:: Takeshi Yamamuro

Reporter:: Takeshi Yamamuro

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Nov/19 05:22

Updated:: 16/Jan/20 21:00

Resolved:: 15/Jan/20 13:05