Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
query like:
create table ... as select ... from ... limit offset or: insert overwrite table ... select ... from ... limit offset
will cause data loss.
reproduce step:
create table test_limit_offset (id int); insert into test_limit_offset values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16); drop table if exists test_limit_offset2; create table test_limit_offset2 as select * from test_limit_offset limit 5 offset 2;
query test_limit_offset2
+------------------------+ | test_limit_offset2.id | +------------------------+ | 5 | | 6 | | 7 | +------------------------+
expected 5 numbers while got 3;
We can see the problem from the execution plan
+----------------------------------------------------+ | Explain | +----------------------------------------------------+ | Plan optimized by CBO. | | | | Vertex dependency in root stage | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | | | Stage-3 | | Stats Work{} | | Stage-9 | | Create Table Operator: | | name:dgz.test_limit_offset2 | | Stage-2 | | Dependency Collection{} | | Stage-5(CONDITIONAL) | | Move Operator | | Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Conditional Operator | | Stage-1 | | Reducer 2 | | File Output Operator [FS_6] | | table:{"name:":"dgz.test_limit_offset2"} | | Limit [LIM_5] (rows=5 width=1) | //reduce side full limit offset | Number of rows:5,Offset of rows:2 | | Select Operator [SEL_4] (rows=5 width=1) | | Output:["_col0"] | | <-Map 1 [CUSTOM_SIMPLE_EDGE] | | PARTITION_ONLY_SHUFFLE [RS_3] | | Limit [LIM_2] (rows=5 width=1) | //map side full limit offset | Number of rows:5,Offset of rows:2 | | Select Operator [SEL_1] (rows=13 width=1) | | Output:["_col0"] | | TableScan [TS_0] (rows=13 width=1) | | dgz@test_limit_offset,test_limit_offset,Tbl:COMPLETE,Col:NONE,Output:["id"] | | Stage-4(CONDITIONAL) | | File Merge | | Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Stage-7 | | Move Operator | | Stage-6(CONDITIONAL) | | File Merge | | Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) | | Stage-0 | | Move Operator | | Please refer to the previous Stage-5(CONDITIONAL) | | Please refer to the previous Stage-4(CONDITIONAL) | | Please refer to the previous Stage-7 | | | +----------------------------------------------------+
It generate limit operator on both map and reduce side.