Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
-
ghx-label-6
Description
Queries with the following characteristics may produce wrong results due to an incorrectly assigned runtime filter:
- The query option RUNTIME_FILTER_MODE is set to GLOBAL
- The query has an outer join
- A scan on the nullable side of that outer join has a runtime filter with a NULL-checking expression such as COALESCE/IFNULL/CASE
- The latter point imples that there is another join above the outer join with a NULL-checking expression in it's join condition
Reproduction:
select count(*) from functional.alltypestiny t1
left outer join functional.alltypestiny t2
on t1.id = t2.id
where coalesce(t2.id + 10, 100) in (select 100)
+----------+
| count(*) |
+----------+
| 8 |
+----------+
We expect a count of 0. A count of 8 is incorrect.
Query plan:
+---------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=3.88MB |
| Per-Host Resource Estimates: Memory=87.88MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 10:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | |
| 09:EXCHANGE [UNPARTITIONED] |
| | |
| 05:AGGREGATE |
| | output: count(*) |
| | |
| 04:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
| | hash predicates: coalesce(t2.id + 10, 100) = `$a$1`.`$c$1` |
| | runtime filters: RF000 <- `$a$1`.`$c$1` |
| | |
| |--08:EXCHANGE [BROADCAST] |
| | | |
| | 02:UNION |
| | constant-operands=1 |
| | |
| 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
| | hash predicates: t1.id = t2.id |
| | |
| |--07:EXCHANGE [HASH(t2.id)] |
| | | |
| | 01:SCAN HDFS [functional.alltypestiny t2] |
| | partitions=4/4 files=4 size=460B |
| | runtime filters: RF000 -> coalesce(t2.id + 10, 100) <--- This runtime filter is not correct |
| | |
| 06:EXCHANGE [HASH(t1.id)] |
| | |
| 00:SCAN HDFS [functional.alltypestiny t1] |
| partitions=4/4 files=4 size=460B |
+---------------------------------------------------------------+
Explanation:
- RF000 filters out all rows in scan 01
- In join 03 there are no join matches since the right-hand is empty. All rows from the right-hand side are nulled.
- The join condition in join 04 now satisfies all input rows because every "t2.id" is NULL, so after the COALESCE() the join condition becomes 100 = 100
Workaround
- Set RUNTIME_FILTER_MODE to LOCAL or OFF