Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6902

Extra limit operator is not needed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • None

    Description

      For TPCDS query 49, there is an extra limit operator that is not needed.

      Here is the query:

      SELECT 'web' AS channel, 
                     web.item, 
                     web.return_ratio, 
                     web.return_rank, 
                     web.currency_rank 
      FROM   (SELECT item, 
                     return_ratio, 
                     currency_ratio, 
                     Rank() 
                       OVER ( 
                         ORDER BY return_ratio)   AS return_rank, 
                     Rank() 
                       OVER ( 
                         ORDER BY currency_ratio) AS currency_rank 
              FROM   (SELECT ws.ws_item_sk                                       AS 
                             item, 
                             ( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS DEC(15, 
                                    4)) / 
                               Cast( 
                               Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS 
                             return_ratio, 
                             ( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4)) 
                               / Cast( 
                               Sum( 
                               COALESCE(ws.ws_net_paid, 0)) AS DEC(15, 
                               4)) )                                             AS 
                             currency_ratio 
                      FROM   web_sales ws 
                             LEFT OUTER JOIN web_returns wr 
                                          ON ( ws.ws_order_number = wr.wr_order_number 
                                               AND ws.ws_item_sk = wr.wr_item_sk ), 
                             date_dim 
                      WHERE  wr.wr_return_amt > 10000 
                             AND ws.ws_net_profit > 1 
                             AND ws.ws_net_paid > 0 
                             AND ws.ws_quantity > 0 
                             AND ws_sold_date_sk = d_date_sk 
                             AND d_year = 1999 
                             AND d_moy = 12 
                      GROUP  BY ws.ws_item_sk) in_web) web 
      WHERE  ( web.return_rank <= 10 
                OR web.currency_rank <= 10 ) 
      UNION 
      SELECT 'catalog' AS channel, 
             catalog.item, 
             catalog.return_ratio, 
             catalog.return_rank, 
             catalog.currency_rank 
      FROM   (SELECT item, 
                     return_ratio, 
                     currency_ratio, 
                     Rank() 
                       OVER ( 
                         ORDER BY return_ratio)   AS return_rank, 
                     Rank() 
                       OVER ( 
                         ORDER BY currency_ratio) AS currency_rank 
              FROM   (SELECT cs.cs_item_sk                                       AS 
                             item, 
                             ( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS DEC(15, 
                                    4)) / 
                               Cast( 
                               Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS 
                             return_ratio, 
                             ( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 4 
                                    )) / 
                               Cast(Sum( 
                               COALESCE(cs.cs_net_paid, 0)) AS DEC( 
                               15, 4)) )                                         AS 
                             currency_ratio 
                      FROM   catalog_sales cs 
                             LEFT OUTER JOIN catalog_returns cr 
                                          ON ( cs.cs_order_number = cr.cr_order_number 
                                               AND cs.cs_item_sk = cr.cr_item_sk ), 
                             date_dim 
                      WHERE  cr.cr_return_amount > 10000 
                             AND cs.cs_net_profit > 1 
                             AND cs.cs_net_paid > 0 
                             AND cs.cs_quantity > 0 
                             AND cs_sold_date_sk = d_date_sk 
                             AND d_year = 1999 
                             AND d_moy = 12 
                      GROUP  BY cs.cs_item_sk) in_cat) catalog 
      WHERE  ( catalog.return_rank <= 10 
                OR catalog.currency_rank <= 10 ) 
      UNION 
      SELECT 'store' AS channel, 
             store.item, 
             store.return_ratio, 
             store.return_rank, 
             store.currency_rank 
      FROM   (SELECT item, 
                     return_ratio, 
                     currency_ratio, 
                     Rank() 
                       OVER ( 
                         ORDER BY return_ratio)   AS return_rank, 
                     Rank() 
                       OVER ( 
                         ORDER BY currency_ratio) AS currency_rank 
              FROM   (SELECT sts.ss_item_sk                                       AS 
                             item, 
                             ( Cast(Sum(COALESCE(sr.sr_return_quantity, 0)) AS DEC(15, 
                                    4)) / 
                               Cast( 
                               Sum(COALESCE(sts.ss_quantity, 0)) AS DEC(15, 4)) ) AS 
                             return_ratio, 
                             ( Cast(Sum(COALESCE(sr.sr_return_amt, 0)) AS DEC(15, 4)) 
                               / Cast( 
                               Sum( 
                               COALESCE(sts.ss_net_paid, 0)) AS DEC(15, 4)) )     AS 
                             currency_ratio 
                      FROM   store_sales sts 
                             LEFT OUTER JOIN store_returns sr 
                                          ON ( sts.ss_ticket_number = 
                                               sr.sr_ticket_number 
                                               AND sts.ss_item_sk = sr.sr_item_sk ), 
                             date_dim 
                      WHERE  sr.sr_return_amt > 10000 
                             AND sts.ss_net_profit > 1 
                             AND sts.ss_net_paid > 0 
                             AND sts.ss_quantity > 0 
                             AND ss_sold_date_sk = d_date_sk 
                             AND d_year = 1999 
                             AND d_moy = 12 
                      GROUP  BY sts.ss_item_sk) in_store) store 
      WHERE  ( store.return_rank <= 10 
                OR store.currency_rank <= 10 ) 
      ORDER  BY 1, 
                4, 
                5
      LIMIT 100; 
      
      

      Here is the top of the plan:

      00-00    Screen : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382656934813E10 rows, 1.6644370208245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33692
      00-01      Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382646934813E10 rows, 1.6644370207245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33691
      00-02        SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382546934813E10 rows, 1.6644370157245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33690
      00-03          Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382446934813E10 rows, 1.6644370147245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33689
      00-04            Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382346934813E10 rows, 1.6644370107245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33688
      00-05              SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587382246934813E10 rows, 1.6644370067245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33687
      00-06                TopN(limit=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587373179472916E10 rows, 1.664436916049882E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33686
      00-07                  HashAgg(group=[{0, 1, 2, 3, 4}]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587364112011019E10 rows, 1.6644296869003403E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33685
      00-08                    Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 90674.61896625, cumulative cost = {1.5587273437392052E10 rows, 1.664393417052754E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33684
      00-09                      HashToRandomExchange(dist0=[[$0]], dist1=[[$1]], dist2=[[$2]], dist3=[[$3]], dist4=[[$4]]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 90674.61896625, cumulative cost = {1.5587182762773085E10 rows, 1.6643888833218057E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33683
      

      There are two limit operators, 00-03 and 00-04. Only one should be needed.

      Attachments

        Activity

          People

            priteshm Pritesh Maker
            rhou Robert Hou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: