Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2407

Nested Types : Remove calls to clock_gettime for a 9x performance improvement on EC2

    XMLWordPrintableJSON

Details

    Description

      Queries against Nested types show that ~90% of the time is spent in clock_gettime.
      A cheaper accounting method can speed up Nested queries by 8-9x

      select
        count(*)
      from
        customer.orders_string o,
        o.lineitems_string l
      where
        l_shipmode in ('MAIL', 'SHIP')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= '1994-01-01'
        and l_receiptdate < '1995-01-01'
      group by
        l_shipmode
      order by
        l_shipmode
      

      Schema
      ------------------------------------------------------

      name type comment

      ------------------------------------------------------

      c_custkey bigint  
      c_name string  
      c_address string  
      c_nationkey bigint  
      c_phone string  
      c_acctbal double  
      c_mktsegment string  
      c_comment string  
      orders_string array<struct<  
        o_orderkey:bigint,  
        o_orderstatus:string,  
        o_totalprice:double,  
        o_orderdate:string,  
        o_orderpriority:string,  
        o_clerk:string,  
        o_shippriority:bigint,  
        o_comment:string,  
        lineitems_string:array<struct<  
        l_partkey:bigint,  
        l_suppkey:bigint,  
        l_linenumber:bigint,  
        l_quantity:double,  
        l_extendedprice:double,  
        l_discount:double,  
        l_tax:double,  
        l_returnflag:string,  
        l_linestatus:string,  
        l_shipdate:string,  
        l_commitdate:string,  
        l_receiptdate:string,  
        l_shipinstruct:string,  
        l_shipmode:string,  
        l_comment:string  
        >>  
        >>  

      ------------------------------------------------------

      These are all the function

      Function / Call Stack	Effective Time by Utilization	Spin Time	Overhead Time	Module	Function (Full)	Source File	Start Address
      clock_gettime	86.233s	0s	0s	librt.so.1	clock_gettime		0x3e10
        impala::UnnestNode::GetNext	17.552s	0s	0s	impalad	impala::UnnestNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*)		0xca9280
        impala::NestedLoopJoinNode::GetNext	17.380s	0s	0s	impalad	impala::NestedLoopJoinNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*)		0xc77d50
        impala::NestedLoopJoinNode::ConstructBuildSide	17.242s	0s	0s	impalad	impala::NestedLoopJoinNode::ConstructBuildSide(impala::RuntimeState*)		0xc74f10
        impala::UnnestNode::Open	16.830s	0s	0s	impalad	impala::UnnestNode::Open(impala::RuntimeState*)		0xca96c0
        impala::ScopedTimer<impala::MonotonicStopWatch>::~ScopedTimer	8.769s	0s	0s	impalad	impala::ScopedTimer<impala::MonotonicStopWatch>::~ScopedTimer(void)		0x786630
        impala::BlockingJoinNode::Open	8.380s	0s	0s	impalad	impala::BlockingJoinNode::Open(impala::RuntimeState*)		0xcbbdf0
        impala::HdfsScanNode::GetNext	0.040s	0s	0s	impalad	impala::HdfsScanNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*)		0xc23530
        impala::PlanFragmentExecutor::OpenInternal	0.020s	0s	0s	impalad	impala::PlanFragmentExecutor::OpenInternal(void)		0xbeaaf0
        impala::ExecNode::RowBatchQueue::AddBatch	0.020s	0s	0s	impalad	impala::ExecNode::RowBatchQueue::AddBatch(impala::RowBatch*)		0xc0c260
      

      The explain plan

      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Explain String                                                                                                                                                           |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=544.00MB VCores=2                                                                                                                |
      | WARNING: The following tables are missing relevant table and/or column statistics.                                                                                       |
      | tpch_nested_parquet_30.customer                                                                                                                                          |
      |                                                                                                                                                                          |
      | 09:MERGING-EXCHANGE [UNPARTITIONED]                                                                                                                                      |
      | |  order by: l_shipmode ASC                                                                                                                                              |
      | |                                                                                                                                                                        |
      | 06:SORT                                                                                                                                                                  |
      | |  order by: l_shipmode ASC                                                                                                                                              |
      | |                                                                                                                                                                        |
      | 08:AGGREGATE [FINALIZE]                                                                                                                                                  |
      | |  output: count:merge(*)                                                                                                                                                |
      | |  group by: l_shipmode                                                                                                                                                  |
      | |                                                                                                                                                                        |
      | 07:EXCHANGE [HASH(l_shipmode)]                                                                                                                                           |
      | |                                                                                                                                                                        |
      | 05:AGGREGATE                                                                                                                                                             |
      | |  output: count(*)                                                                                                                                                      |
      | |  group by: l_shipmode                                                                                                                                                  |
      | |                                                                                                                                                                        |
      | 01:SUBPLAN                                                                                                                                                               |
      | |                                                                                                                                                                        |
      | |--04:NESTED LOOP JOIN [CROSS JOIN]                                                                                                                                      |
      | |  |                                                                                                                                                                     |
      | |  |--02:SINGULAR ROW SRC                                                                                                                                                |
      | |  |                                                                                                                                                                     |
      | |  03:UNNEST [o.lineitems_string l]                                                                                                                                      |
      | |                                                                                                                                                                        |
      | 00:SCAN HDFS [tpch_nested_parquet_30.customer.orders_string o]                                                                                                           |
      |    partitions=1/1 files=41 size=15.01GB                                                                                                                                  |
      |    predicates on l: l_shipmode IN ('MAIL', 'SHIP'), l_commitdate < l_receiptdate, l_shipdate < l_commitdate, l_receiptdate >= '1994-01-01', l_receiptdate < '1995-01-01' |
      +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Query summary

      +---------------------------+--------+----------+----------+--------+------------+-----------+---------------+-------------------------------------------------+ 
      | Operator                  | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                                          | 
      +---------------------------+--------+----------+----------+--------+------------+-----------+---------------+-------------------------------------------------+ 
      | 09:MERGING-EXCHANGE       | 1      | 49.90us  | 49.90us  | 2      | 450.00M    | 0 B       | -1 B          | UNPARTITIONED                                   | 
      | 06:SORT                   | 4      | 286.31us | 345.15us | 2      | 450.00M    | 24.00 MB  | 416.00 MB     |                                                 | 
      | 08:AGGREGATE              | 4      | 258.75ms | 288.16ms | 2      | 450.00M    | 3.27 MB   | 128.00 MB     | FINALIZE                                        | 
      | 07:EXCHANGE               | 4      | 35.41us  | 60.43us  | 8      | 450.00M    | 0 B       | 0 B           | HASH(l_shipmode)                                | 
      | 05:AGGREGATE              | 4      | 356.10ms | 381.49ms | 8      | 450.00M    | 41.33 MB  | 128.00 MB     |                                                 |
      | 01:SUBPLAN                | 4      | 22.51s   | 22.78s   | 0      | 450.00M    | 28.59 MB  | 0 B           |                                                 |
      | |--04:NESTED LOOP JOIN    | 4      | 69.36s   | 69.76s   | 0      | 10         | 0 B       | 16 B          | CROSS JOIN                                      |
      | |  |--02:SINGULAR ROW SRC | 4      | 0ns      | 0ns      | 0      | 1          | 0 B       | 0 B           |                                                 |
      | |  03:UNNEST              | 4      | 18.81s   | 18.94s   | 0      | 10         | 0 B       | 0 B           | o.lineitems_string l                            |
      | 00:SCAN HDFS              | 4      | 408.93ms | 585.28ms | 46.50M | 45.00M     | 304.66 MB | 88.00 MB      | tpch_nested_parquet_30.customer.orders_string o |
      +---------------------------+--------+----------+----------+--------+------------+-----------+---------------+-------------------------------------------------+
      

      Attachments

        1. q12Nested.tar.gz
          3.24 MB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              jbapple Jim Apple
              mmokhtar Mostafa Mokhtar
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: