[HIVE-22474] Query based major compaction always creates only one bucket file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Hive
Labels:
None

Description

set hive.execution.engine=mr;
drop table if exists tbl2;
create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as ORC TBLPROPERTIES('bucketing_version'='2', 'transactional'='true', 'compactorthreshold.hive.compactor.delta.num.threshold'='3');
insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
delete from tbl2 where b = 2;
insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);
delete from tbl2 where a = 1;

Having the above use case, at the end of the major compaction the base directory contains only one bucket file, although the table is bucketed in 2 buckets. Before running the compaction, the delta directories contains the right amount of bucket files, and the data is split accordingly.

Attachments

Issue Links

is duplicated by

HIVE-23703 Major QB compaction with multiple FileSinkOperators results in data loss and one original file

Closed

Activity

People

Assignee:: László Pintér

Reporter:: László Pintér

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Nov/19 14:31

Updated:: 17/Jun/20 10:17

Resolved:: 17/Jun/20 10:17