[CALCITE-4483] WITHIN DISTINCT clause for aggregate functions (experimental) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.27.0
Component/s: None
Labels:
- pull-request-available

Description

Add a WITHIN DISTINCT clause to aggregate functions, allowing duplicate rows to be eliminated before entering the function.

This feature is non-standard, and in fact does not exist in any database I am aware of. Therefore this feature is experimental, and may evolve or be removed without notice.

It is related to DISTINCT, and is in fact a generalization of it. DISTINCT can always be rewritten in terms of WITHIN DISTINCT. For example, SUM(DISTINCT sal) is equivalent to SUM(sal) WITHIN DISTINCT (sal).

Consider the query

SELECT SUM(age),
   SUM(DISTINCT age),
  SUM(age) WITHIN DISTINCT (name)
FROM Friends

where Friends has the rows

name   age job
====== === ==========
Julian  16 Programmer
Dick    15
Anne    13 Car wash
George  15 Lifeguard
George  15 Dog walker
Timmy    4

Note that there are two rows for George, because she has two jobs.

The values of the columns are as follows:

SUM(age) has the value (16 + 15 + 13 + 15 + 15 + 4) = 78;
SUM(DISTINCT age) has the value (16 + 15 + 13 + 4) = 48;
SUM(age) WITHIN DISTINCT (name) has the value (16 + 15 + 13 + 15 + 4) = 63.

WITHIN DISTINCT has treated the two 15 values for George as one value, but has still counted the 15 for Dick separately.

The WITHIN DISTINCT clause can be useful to prevent double-counting when duplicates have been added via a many-to-one join.

Attachments

Issue Links

depends upon

CALCITE-4484 Add UNIQUE_VALUE(x) aggregate function, that throws if x is not unique

Open

relates to

CALCITE-4496 Measure columns ("SELECT ... AS MEASURE")

Resolved

CALCITE-6217 PER clause for aggregate functions

Open

links to

GitHub Pull Request #2339

Activity

People

Assignee:: Julian Hyde

Reporter:: Julian Hyde

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 01/Feb/21 19:30

Updated:: 02/Apr/24 07:32

Resolved:: 11/Feb/21 23:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m