Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
-
None
-
ghx-label-9
Description
Random expressions on the consumer side of runtime filters are evaluated independently from the "final" join, which gives +1 chance for rows to be dropped. This means that the same query can return less or different rows if the runtime fiiter was used than if not.
Example:
use tpch_parquet; set DISABLE_ROW_RUNTIME_FILTERING=0; select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey; result: 9722 set DISABLE_ROW_RUNTIME_FILTERING=1; select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey; result: 9803
( rand() is pseudo-random, so running the same query without changing to query option always returns the same result)
Optimizations like runtime filters should have no effect on the results, even in case of non-deterministic expressions.
Attachments
Issue Links
- is related to
-
IMPALA-5509 Runtime filter : Extend runtime filter to support Dictionary values
- Resolved