Description
This query has an sq_count check, though is useless on a constant key.
hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type); Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product OK Plan optimized by CBO. Vertex dependency in root stage Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) Reducer 3 <- Map 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) Reducer 6 <- Map 5 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Map 1 vectorized, llap File Output Operator [FS_64] Select Operator [SEL_63] (rows=66666666 width=621) Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] Filter Operator [FIL_62] (rows=66666666 width=625) predicate:(_col5 > _col10) Map Join Operator [MAPJOIN_61] (rows=200000000 width=625) Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] <-Reducer 6 [BROADCAST_EDGE] vectorized, llap BROADCAST [RS_58] Select Operator [SEL_57] (rows=1 width=4) Output:["_col0"] Group By Operator [GBY_56] (rows=1 width=89) Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 <-Map 5 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_55] PartitionCols:_col0 Group By Operator [GBY_54] (rows=86 width=89) Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' Select Operator [SEL_53] (rows=1212121 width=109) Output:["_col1"] Filter Operator [FIL_52] (rows=1212121 width=109) predicate:(p_type = '1') TableScan [TS_17] (rows=200000000 width=109) tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621) Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] <-Reducer 4 [BROADCAST_EDGE] vectorized, llap BROADCAST [RS_51] Select Operator [SEL_50] (rows=1 width=8) Filter Operator [FIL_49] (rows=1 width=8) predicate:(sq_count_check(_col0) <= 1) Group By Operator [GBY_48] (rows=1 width=8) Output:["_col0"],aggregations:["count(VALUE._col0)"] <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap PARTITION_ONLY_SHUFFLE [RS_47] Group By Operator [GBY_46] (rows=1 width=8) Output:["_col0"],aggregations:["count()"] Select Operator [SEL_45] (rows=1 width=85) Group By Operator [GBY_44] (rows=1 width=85) Output:["_col0"],keys:KEY._col0 <-Map 2 [SIMPLE_EDGE] vectorized, llap SHUFFLE [RS_43] PartitionCols:_col0 Group By Operator [GBY_42] (rows=83 width=85) Output:["_col0"],keys:'1' Select Operator [SEL_41] (rows=1212121 width=105) Filter Operator [FIL_40] (rows=1212121 width=105) predicate:(p_type = '1') TableScan [TS_2] (rows=200000000 width=105) tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] <-Select Operator [SEL_59] (rows=200000000 width=621) Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] TableScan [TS_0] (rows=200000000 width=621) tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type. Fixed by HIVE-16851
Attachments
Attachments
Issue Links
- is related to
-
HIVE-16851 Scalar subquery with group by missing sq_count_check UDF
- Closed
- relates to
-
HIVE-15544 Support scalar subqueries
- Closed
-
HIVE-16330 Improve plans for scalar subquery with aggregates
- Closed
-
HIVE-16885 Non-equi Joins: Filter clauses should be pushed into the ON clause
- Closed
- links to