Details

    • Sub-task
    • Status: Closed
    • Critical
    • Resolution: Won't Fix
    • 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
    • None
    • SQL

    Description

      A query below with an empty input seems to have different answers between PgSQL and Spark;

      postgres=# create table gstest_empty (a integer, b integer, v integer);
      CREATE TABLE
      postgres=# select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),());
       a | b | sum | count 
      ---+---+-----+-------
         |   |     |     0
      (1 row)
      
      scala> sql("""select a, b, sum(v), count(*) from gstest_empty group by grouping sets ((a,b),())""").show
      +---+---+------+--------+
      |  a|  b|sum(v)|count(1)|
      +---+---+------+--------+
      +---+---+------+--------+
       

      Attachments

        Issue Links

          Activity

            gurwls223 Hyukjin Kwon added a comment - - edited

            maropu Let me lower the priority (given the discussion in the PR).

            gurwls223 Hyukjin Kwon added a comment - - edited maropu Let me lower the priority (given the discussion in the PR).

            Yea, never mind. I don't have a smart idea now to solve this issue.

            maropu Takeshi Yamamuro added a comment - Yea, never mind. I don't have a smart idea now to solve this issue.
            dongjoon Dongjoon Hyun added a comment -

            Since this is closed, I removed the target version, `3.0.0`.

            dongjoon Dongjoon Hyun added a comment - Since this is closed, I removed the target version, `3.0.0`.
            dongjoon Dongjoon Hyun added a comment -

            For the record, please see the discussion on the following PR. Although this is a correct issue, the existing behavior of Apache Spark 2.4 is also reasonable in the same way with Oracle/SQLServer. So, we keep this way in Apache Spark 3.0+ consistently. This issue is moved from PostgreSQL compatibility umbrella Jira into this Spark versioning umbrella Jira (SPARK-31085) to give a better context.

            To put a conclusion: I think this PR does fix a "correctness" issue according to the SQL standard. But as @tgravescs said in #27233 (comment) , the current behavior looks reasonable as well, and is the same with Oracle/SQL Server.
            
            This is a very corner case, and most likely people don't care.
            
            dongjoon Dongjoon Hyun added a comment - For the record, please see the discussion on the following PR. Although this is a correct issue, the existing behavior of Apache Spark 2.4 is also reasonable in the same way with Oracle/SQLServer. So, we keep this way in Apache Spark 3.0+ consistently. This issue is moved from PostgreSQL compatibility umbrella Jira into this Spark versioning umbrella Jira ( SPARK-31085 ) to give a better context. https://github.com/apache/spark/pull/27233 To put a conclusion: I think this PR does fix a "correctness" issue according to the SQL standard. But as @tgravescs said in #27233 (comment) , the current behavior looks reasonable as well, and is the same with Oracle/SQL Server. This is a very corner case , and most likely people don't care.

            People

              Unassigned Unassigned
              maropu Takeshi Yamamuro
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: