Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0
Description
From allisonwang-db :
There is no COUNT bug when the correlated equality predicates are also in the group by clause. However, the current logic to handle the COUNT bug still adds default aggregate function value and returns incorrect results.
create view t1(c1, c2) as values (0, 1), (1, 2);
create view t2(c1, c2) as values (0, 2), (0, 3);
select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from t1;
-- Correct answer: [(0, 1, 2), (1, 2, null)]
+---+---+------------------+
|c1 |c2 |scalarsubquery(c1)|
+---+---+------------------+
|0 |1 |2 |
|1 |2 |0 |
+---+---+------------------+
This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: https://issues.apache.org/jira/browse/SPARK-36113
Attachments
Issue Links
- relates to
-
SPARK-36113 Unify the logic to handle COUNT bug for scalar and lateral subqueries
- Open
- links to