Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Consider the following query:
CREATE TABLE T1(a STRING, b STRING, s BIGINT); INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456); SELECT * FROM ( SELECT a, b, sum(s) FROM T1 GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b)) ) t WHERE a IS NOT NULL;
When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output:
NULL NULL 123456 NULL bbbb 123456 aaaa NULL 123456 aaaa bbbb 123456
We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect.
When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong.