[HIVE-16793] Scalar sub-query: sq_count_check not required if gby keys are constant - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

This query has an sq_count check, though is useless on a constant key.

hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
OK
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
Reducer 6 <- Map 5 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 1 vectorized, llap
      File Output Operator [FS_64]
        Select Operator [SEL_63] (rows=66666666 width=621)
          Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
          Filter Operator [FIL_62] (rows=66666666 width=625)
            predicate:(_col5 > _col10)
            Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
              Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
            <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
              BROADCAST [RS_58]
                Select Operator [SEL_57] (rows=1 width=4)
                  Output:["_col0"]
                  Group By Operator [GBY_56] (rows=1 width=89)
                    Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
                  <-Map 5 [SIMPLE_EDGE] vectorized, llap
                    SHUFFLE [RS_55]
                      PartitionCols:_col0
                      Group By Operator [GBY_54] (rows=86 width=89)
                        Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
                        Select Operator [SEL_53] (rows=1212121 width=109)
                          Output:["_col1"]
                          Filter Operator [FIL_52] (rows=1212121 width=109)
                            predicate:(p_type = '1')
                            TableScan [TS_17] (rows=200000000 width=109)
                              tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
            <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
                Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
              <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
                BROADCAST [RS_51]
                  Select Operator [SEL_50] (rows=1 width=8)
                    Filter Operator [FIL_49] (rows=1 width=8)
                      predicate:(sq_count_check(_col0) <= 1)
                      Group By Operator [GBY_48] (rows=1 width=8)
                        Output:["_col0"],aggregations:["count(VALUE._col0)"]
                      <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
                        PARTITION_ONLY_SHUFFLE [RS_47]
                          Group By Operator [GBY_46] (rows=1 width=8)
                            Output:["_col0"],aggregations:["count()"]
                            Select Operator [SEL_45] (rows=1 width=85)
                              Group By Operator [GBY_44] (rows=1 width=85)
                                Output:["_col0"],keys:KEY._col0
                              <-Map 2 [SIMPLE_EDGE] vectorized, llap
                                SHUFFLE [RS_43]
                                  PartitionCols:_col0
                                  Group By Operator [GBY_42] (rows=83 width=85)
                                    Output:["_col0"],keys:'1'
                                    Select Operator [SEL_41] (rows=1212121 width=105)
                                      Filter Operator [FIL_40] (rows=1212121 width=105)
                                        predicate:(p_type = '1')
                                        TableScan [TS_2] (rows=200000000 width=105)
                                          tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
              <-Select Operator [SEL_59] (rows=200000000 width=621)
                  Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                  TableScan [TS_0] (rows=200000000 width=621)
                    tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]

~~The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type.~~ Fixed by ~~HIVE-16851~~

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16793.1.patch
19/Jun/17 23:54
60 kB
Vineet Garg
HIVE-16793.2.patch
21/Jun/17 22:49
294 kB
Vineet Garg
HIVE-16793.3.patch
26/Jun/17 17:48
273 kB
Vineet Garg
HIVE-16793.4.patch
27/Jun/17 21:20
475 kB
Vineet Garg
HIVE-16793.5.patch
12/Jul/17 21:26
59 kB
Vineet Garg
HIVE-16793.6.patch
12/Jul/17 23:50
58 kB
Vineet Garg

Issue Links

is related to

HIVE-16851 Scalar subquery with group by missing sq_count_check UDF

Closed

relates to

HIVE-15544 Support scalar subqueries

Closed

HIVE-16330 Improve plans for scalar subquery with aggregates

Closed

HIVE-16885 Non-equi Joins: Filter clauses should be pushed into the ON clause

Closed

links to

ReviewBoard

Scalar sub-query: sq_count_check not required if gby keys are constant

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates