Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
Description
Following query fails when with Error: UNSUPPORTED_OPERATION ERROR: Hash-Join can not partition the inner data any further (probably due to too many join-key duplicates) on TPC-H SF100 data.
set `exec.hashjoin.enable.runtime_filter` = true; set `exec.hashjoin.runtime_filter.max.waiting.time` = 10000; set `planner.enable_broadcast_join` = false; select count(*) from lineitem l1 where l1.l_discount IN ( select distinct(cast(l2.l_discount as double)) from lineitem l2); reset `exec.hashjoin.enable.runtime_filter`; reset `exec.hashjoin.runtime_filter.max.waiting.time`; reset `planner.enable_broadcast_join`;
The subquery contains distinct keyword and hence there should not be duplicate values.
I suspect that the failure is caused by semijoin because the query succeeds when semijoin is disabled explicitly.