Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16885

Non-equi Joins: Filter clauses should be pushed into the ON clause

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Physical Optimizer
    • None

    Description

      FIL_24 -> MAPJOIN_23

      hive> explain  select * from part where p_size > (select max(p_size) from part group by p_type);
      Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
      OK
      Plan optimized by CBO.
      
      Vertex dependency in root stage
      Map 1 <- Reducer 3 (BROADCAST_EDGE)
      Reducer 3 <- Map 2 (SIMPLE_EDGE)
      
      Stage-0
        Fetch Operator
          limit:-1
          Stage-1
            Map 1 vectorized, llap
            File Output Operator [FS_26]
              Select Operator [SEL_25] (rows=11000000000 width=621)
                Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                Filter Operator [FIL_24] (rows=11000000000 width=625)
                  predicate:(_col5 > _col9)
                  Map Join Operator [MAPJOIN_23] (rows=33000000000 width=625)
                    Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
                  <-Reducer 3 [BROADCAST_EDGE] vectorized, llap
                    BROADCAST [RS_21]
                      Select Operator [SEL_20] (rows=165 width=4)
                        Output:["_col0"]
                        Group By Operator [GBY_19] (rows=165 width=109)
                          Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
                        <-Map 2 [SIMPLE_EDGE] vectorized, llap
                          SHUFFLE [RS_18]
                            PartitionCols:_col0
                            Group By Operator [GBY_17] (rows=14190 width=109)
                              Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
                              Select Operator [SEL_16] (rows=200000000 width=109)
                                Output:["p_type","p_size"]
                                TableScan [TS_2] (rows=200000000 width=109)
                                  tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
                  <-Select Operator [SEL_22] (rows=200000000 width=621)
                      Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
                      TableScan [TS_0] (rows=200000000 width=621)
                        tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
      

      Attachments

        1. HIVE-16885.03.patch
          378 kB
          jcamachorodriguez
        2. HIVE-16885.02.patch
          232 kB
          jcamachorodriguez
        3. HIVE-16885.01.patch
          231 kB
          jcamachorodriguez
        4. HIVE-16885.patch
          72 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: