Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
Description
A query below with multiple grouping sets seems to have different answers between PgSQL and Spark;
postgres=# create table gstest4(id integer, v integer, unhashable_col bit(4), unsortable_col xid); postgres=# insert into gstest4 postgres-# values (1,1,b'0000','1'), (2,2,b'0001','1'), postgres-# (3,4,b'0010','2'), (4,8,b'0011','2'), postgres-# (5,16,b'0000','2'), (6,32,b'0001','2'), postgres-# (7,64,b'0010','1'), (8,128,b'0011','1'); INSERT 0 8 postgres=# select unsortable_col, count(*) postgres-# from gstest4 group by grouping sets ((unsortable_col),(unsortable_col)) postgres-# order by text(unsortable_col); unsortable_col | count ----------------+------- 1 | 8 1 | 8 2 | 8 2 | 8 (4 rows)
scala> sql("""create table gstest4(id integer, v integer, unhashable_col /* bit(4) */ byte, unsortable_col /* xid */ integer) using parquet""") scala> sql(""" | insert into gstest4 | values (1,1,tinyint('0'),1), (2,2,tinyint('1'),1), | (3,4,tinyint('2'),2), (4,8,tinyint('3'),2), | (5,16,tinyint('0'),2), (6,32,tinyint('1'),2), | (7,64,tinyint('2'),1), (8,128,tinyint('3'),1) | """) res21: org.apache.spark.sql.DataFrame = [] scala> scala> sql(""" | select unsortable_col, count(*) | from gstest4 group by grouping sets ((unsortable_col),(unsortable_col)) | order by string(unsortable_col) | """).show +--------------+--------+ |unsortable_col|count(1)| +--------------+--------+ | 1| 8| | 2| 8| +--------------+--------+
Attachments
Issue Links
- relates to
-
SPARK-29699 Different answers in nested aggregates with window functions
- Open
-
SPARK-29701 Different answers when empty input given in GROUPING SETS
- Closed
- links to