Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
Description
Consider TestWorker.minorWithOpenInMiddle()
since there is an open txnId=23, this doesn't have any meaningful minor compaction work to do. The system still tries to compact a single delta file for 21-22 id range, and effectively copies the file onto itself.
This is 1. inefficient and 2. can potentially affect a reader.
(from a real cluster)
Suppose we start with
drwxr-xr-x - ekoifman staff 0 2016-06-09 16:03 /user/hive/warehouse/t/base_0000016 -rw-r--r-- 1 ekoifman staff 602 2016-06-09 16:03 /user/hive/warehouse/t/base_0000016/bucket_00000 drwxr-xr-x - ekoifman staff 0 2016-06-09 16:07 /user/hive/warehouse/t/base_0000017 -rw-r--r-- 1 ekoifman staff 588 2016-06-09 16:07 /user/hive/warehouse/t/base_0000017/bucket_00000 drwxr-xr-x - ekoifman staff 0 2016-06-09 16:07 /user/hive/warehouse/t/delta_0000017_0000017_0000 -rw-r--r-- 1 ekoifman staff 514 2016-06-09 16:06 /user/hive/warehouse/t/delta_0000017_0000017_0000/bucket_00000 drwxr-xr-x - ekoifman staff 0 2016-06-09 16:07 /user/hive/warehouse/t/delta_0000018_0000018_0000 -rw-r--r-- 1 ekoifman staff 612 2016-06-09 16:07 /user/hive/warehouse/t/delta_0000018_0000018_0000/bucket_00000
then do alter table T compact 'minor';
then we end up with
drwxr-xr-x - ekoifman staff 0 2016-06-09 16:07 /user/hive/warehouse/t/base_0000017 -rw-r--r-- 1 ekoifman staff 588 2016-06-09 16:07 /user/hive/warehouse/t/base_0000017/bucket_00000 drwxr-xr-x - ekoifman staff 0 2016-06-09 16:11 /user/hive/warehouse/t/delta_0000018_0000018 -rw-r--r-- 1 ekoifman staff 500 2016-06-09 16:11 /user/hive/warehouse/t/delta_0000018_0000018/bucket_00000 drwxr-xr-x - ekoifman staff 0 2016-06-09 16:07 /user/hive/warehouse/t/delta_0000018_0000018_0000 -rw-r--r-- 1 ekoifman staff 612 2016-06-09 16:07 /user/hive/warehouse/t/delta_0000018_0000018_0000/bucket_00000
So compaction created a new dir /user/hive/warehouse/t/delta_0000018_0000018
Attachments
Attachments
Issue Links
- fixes
-
HIVE-20901 running compactor when there is nothing to do produces duplicate data
- Closed
- is related to
-
HIVE-21266 Don't run cleaner if compaction is skipped (issue with single delta file)
- Closed
-
HIVE-16669 Fine tune Compaction to take advantage of Acid 2.0
- Open
- relates to
-
HIVE-20901 running compactor when there is nothing to do produces duplicate data
- Closed