[SPARK-29701] Different answers when empty input given in GROUPING SETS - ASF JIRA

Details

Type: Sub-task
Status: Closed
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- correctness

Description

A query below with an empty input seems to have different answers between PgSQL and Spark;

postgres=# create table gstest_empty (a integer, b integer, v integer);
CREATE TABLE
postgres=# select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
 a | b | sum | count 
---+---+-----+-------
   |   |     |     0
(1 row)

scala> sql("""select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),())""").show
+---+---+------+--------+
|  a|  b|sum(v)|count(1)|
+---+---+------+--------+
+---+---+------+--------+

Attachments

Issue Links

is related to

SPARK-29708 Different answers in aggregates of duplicate grouping sets

Resolved

links to

GitHub Pull Request #27233

GitHub Pull Request #27305

Activity

Ascending order - Click to sort in descending order

Hyukjin Kwon added a comment - 02/Mar/20 01:43 - edited

maropu Let me lower the priority (given the discussion in the PR).

Hyukjin Kwon added a comment - 02/Mar/20 01:43 - edited maropu Let me lower the priority (given the discussion in the PR).

Takeshi Yamamuro added a comment - 02/Mar/20 01:48

Yea, never mind. I don't have a smart idea now to solve this issue.

Takeshi Yamamuro added a comment - 02/Mar/20 01:48 Yea, never mind. I don't have a smart idea now to solve this issue.

Dongjoon Hyun added a comment - 01/May/20 16:35

Since this is closed, I removed the target version, `3.0.0`.

Dongjoon Hyun added a comment - 01/May/20 16:35 Since this is closed, I removed the target version, `3.0.0`.

Dongjoon Hyun added a comment - 01/May/20 17:12

For the record, please see the discussion on the following PR. Although this is a correct issue, the existing behavior of Apache Spark 2.4 is also reasonable in the same way with Oracle/SQLServer. So, we keep this way in Apache Spark 3.0+ consistently. This issue is moved from PostgreSQL compatibility umbrella Jira into this Spark versioning umbrella Jira (~~SPARK-31085~~) to give a better context.

https://github.com/apache/spark/pull/27233

To put a conclusion: I think this PR does fix a "correctness" issue according to the SQL standard. But as @tgravescs said in #27233 (comment) , the current behavior looks reasonable as well, and is the same with Oracle/SQL Server.

This is a very corner case, and most likely people don't care.

Dongjoon Hyun added a comment - 01/May/20 17:12 For the record, please see the discussion on the following PR. Although this is a correct issue, the existing behavior of Apache Spark 2.4 is also reasonable in the same way with Oracle/SQLServer. So, we keep this way in Apache Spark 3.0+ consistently. This issue is moved from PostgreSQL compatibility umbrella Jira into this Spark versioning umbrella Jira ( SPARK-31085 ) to give a better context. https://github.com/apache/spark/pull/27233 To put a conclusion: I think this PR does fix a "correctness" issue according to the SQL standard. But as @tgravescs said in #27233 (comment) , the current behavior looks reasonable as well, and is the same with Oracle/SQL Server. This is a very corner case , and most likely people don't care.

People

Assignee:: Unassigned

Reporter:: Takeshi Yamamuro

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Nov/19 04:36

Updated:: 12/Dec/22 18:10

Resolved:: 01/May/20 06:27