Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Consider:
create table if not exists srcpart (a int, b int, c int) partitioned by (z int) clustered by (a) into 2 buckets stored as orc tblproperties("transactional"="true"); create temporary table if not exists data1 (x int); insert into data1 values (1),(2),(3); explain from data1 insert into srcpart partition(z) select 0,0,1,x insert into srcpart partition(z=1) select 0,0,1;
Then the plan looks like:
2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES: Stage-2 is a root stage Stage-0 depends on stages: Stage-2 Stage-3 depends on stages: Stage-0 Stage-4 depends on stages: Stage-2 Stage-1 depends on stages: Stage-4 Stage-5 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Map Reduce Map Operator Tree: TableScan alias: data1 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: x (type: int) outputColumnNames: _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE value expressions: _col3 (type: int) Select Operator Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2 (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-0 Move Operator tables: partition: z replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-3 Stats-Aggr Operator Stage: Stage-4 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: 0 (type: int), 0 (type: int), 1 (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-1 Move Operator tables: partition: z 1 replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.srcpart Stage: Stage-5 Stats-Aggr Operator
Note that there are 2 stats aggregation tasks but both branches of the multi-insert update the same partition
Once HIVE-14943 is in, there will be other ways to generate the same situation.
In particular it will be possible to have 2 or 3 branches of the multi-insert any or all of which are using dynamic partition insert which means the set of partitions actually updated is not known until run-time.
If at all possible, the solution should address this.
Attachments
Issue Links
- blocks
-
HIVE-15033 Ensure there is only 1 StatsTask in the query plan
- Open