[SPARK-28664] ORDER BY in aggregate function - ASF JIRA

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

SELECT min(x ORDER BY y) FROM (VALUES(1, NULL)) AS d(x,y);
SELECT min(x ORDER BY y) FROM (VALUES(1, 2)) AS d(x,y);

https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/sql/aggregates.sql#L978-L982

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2020-06-30-15-49-46-796.png
30/Jun/20 20:49
8 kB
Will Zimmerman

Activity

Ascending order - Click to sort in descending order

Jiaan Geng added a comment - 10/Dec/19 10:55

https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/expected/aggregates.out#L2239

yumwang I didn't understand the meaning of this syntax . If it is valuable, I will do it.

Jiaan Geng added a comment - 10/Dec/19 10:55 https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/expected/aggregates.out#L2239 yumwang I didn't understand the meaning of this syntax . If it is valuable, I will do it.

Will Zimmerman added a comment - 30/Jun/20 20:52

yumwang - Would this allow for the changing of Null ordering (e.g. NULL FIRST or NULL LAST)? Currently when evaluating the MIN() of a structure, NULL appears to be taken as the minimum possible value, where as it would be nice if the NULL ordering could be changed or evaluation of structs could be similar to MIN() of a value (where NULL are ignored unless that is the only option). Also, I can't find in Spark documentation what the expected behavior of NULL values is within a struct. The documention I'm referring to can be found https://spark.apache.org/docs/3.0.0-preview/sql-ref-null-semantics.html.

SELECT ID 
   ,COLLECT_SET(STRUCT(x,y)) AS collection
   ,MIN(x) AS min_of_x
   ,MIN(y) AS min_of_y
   ,MIN(STRUCT(x,y)) AS min_of_collection
   ,MAX(STRUCT(x,y)) AS max_of_collection
FROM (values(1234390, 12.0, 'string_1'), (1234390, 37.4, 'string_2'), (1234390, 6.9, NULL), (1234390, 3.1, 'string_3'), (1234390, NULL, 'string_4'), (1234390, NULL, NULL)) AS d(ID,x,y)
GROUP BY 1

Result of Spark SQL query in Spark 2.4.4 can be seen below, where the desired outcome would be (3.1, string_3).

Will Zimmerman added a comment - 30/Jun/20 20:52 yumwang - Would this allow for the changing of Null ordering (e.g. NULL FIRST or NULL LAST)? Currently when evaluating the MIN() of a structure, NULL appears to be taken as the minimum possible value, where as it would be nice if the NULL ordering could be changed or evaluation of structs could be similar to MIN() of a value (where NULL are ignored unless that is the only option). Also, I can't find in Spark documentation what the expected behavior of NULL values is within a struct. The documention I'm referring to can be found https://spark.apache.org/docs/3.0.0-preview/sql-ref-null-semantics.html . SELECT ID ,COLLECT_SET(STRUCT(x,y)) AS collection ,MIN(x) AS min_of_x ,MIN(y) AS min_of_y ,MIN(STRUCT(x,y)) AS min_of_collection ,MAX(STRUCT(x,y)) AS max_of_collection FROM (values(1234390, 12.0, 'string_1' ), (1234390, 37.4, 'string_2' ), (1234390, 6.9, NULL), (1234390, 3.1, 'string_3' ), (1234390, NULL, 'string_4' ), (1234390, NULL, NULL)) AS d(ID,x,y) GROUP BY 1 Result of Spark SQL query in Spark 2.4.4 can be seen below, where the desired outcome would be (3.1, string_3).

People

Assignee:: Unassigned

Reporter:: Yuming Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Aug/19 15:17

Updated:: 30/Jun/20 20:52