Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15146

Too many Stats-Aggr Operator in multi-insert

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Planning
    • None

    Description

      Consider:

      create table if not exists  srcpart (a int, b int, c int)
      partitioned by (z int)
      clustered by (a) into 2 buckets
      stored as orc
      tblproperties("transactional"="true");
      
      
      create temporary table if not exists data1 (x int);
      
      insert into data1 values (1),(2),(3);
      
      explain from data1
      insert into srcpart partition(z) select 0,0,1,x
      insert into srcpart partition(z=1) select 0,0,1;
      

      Then the plan looks like:

      2016-11-07T16:56:19,045  INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES:
        Stage-2 is a root stage
        Stage-0 depends on stages: Stage-2
        Stage-3 depends on stages: Stage-0
        Stage-4 depends on stages: Stage-2
        Stage-1 depends on stages: Stage-4
        Stage-5 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-2
          Map Reduce
            Map Operator Tree:
                TableScan
                  alias: data1
                  Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                  Select Operator
                    expressions: x (type: int)
                    outputColumnNames: _col3
                    Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      sort order:
                      Map-reduce partition columns: 0 (type: int)
                      Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                      value expressions: _col3 (type: int)
                  Select Operator
                    Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      table:
                          input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                          serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            Reduce Operator Tree:
              Select Operator
                expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3
                Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.srcpart
      
        Stage: Stage-0
          Move Operator
            tables:
                partition:
                  z
                replace: false
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: default.srcpart
      
        Stage: Stage-3
          Stats-Aggr Operator
      
        Stage: Stage-4
          Map Reduce
            Map Operator Tree:
                TableScan
                  Reduce Output Operator
                    sort order:
                    Map-reduce partition columns: 0 (type: int)
                    Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
            Reduce Operator Tree:
              Select Operator
                expressions: 0 (type: int), 0 (type: int), 1 (type: int)
                outputColumnNames: _col0, _col1, _col2
                Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE
                  table:
                      input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                      output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                      serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                      name: default.srcpart
      
        Stage: Stage-1
          Move Operator
            tables:
                partition:
                  z 1
                replace: false
                table:
                    input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
                    output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                    serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
                    name: default.srcpart
      
        Stage: Stage-5
          Stats-Aggr Operator
      

      Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition

      Once HIVE-14943 is in, there will be other ways to generate the same situation.

      In particular it will be possible to have 2 or 3 branches of the multi-insert any or all of which are using dynamic partition insert which means the set of partitions actually updated is not known until run-time.

      If at all possible, the solution should address this.

      Attachments

        Issue Links

          Activity

            People

              pxiong Pengcheng Xiong
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated: