Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.17.0
-
None
Description
With the functionality introduced with ad920e69f doesn't handle the appearance of an empty rowset as the result of major delta compaction scheduled, and that leads to errors like below once it's run its course:
W20240906 10:59:01.768857 189660 tablet_mm_ops.cc:364] T 64144a1d4b864aa080e6cc53056546a5 P 574954b3b13a415c83a1660e7f51ee4e: Major delta compaction failed on 64144a1d4b864aa080e6cc53056546a5: Corruption: Failed major delta compaction on RowSet(1675): No min key found: CFile base data in RowSet(1675)
Similarly, the mt-tablet-test is sporadically failing due to the same issue when the test workload happens to create similar situation with all-the-rows-deleted rowsets:
MultiThreadedHybridClockTabletTest/5.UpdateNoMergeCompaction: src/kudu/tablet/mt-tablet-test.cc:489: Failure Failed Bad status: Corruption: Failed major delta compaction on RowSet(1): No min key found: CFile base data in RowSet(1)
There is a simple test scenario that triggers the issue: https://gerrit.cloudera.org/#/c/21809/.
As a workaround, it's possible to set the --all_delete_op_delta_file_cnt_for_compaction to a very high value, e.g. 1000000.
To address the issue properly, it's necessary to update the major delta compaction code to handle situations where the result rowset is completely empty. In theory, swapping out the result rowset with an empty one should be enough: for example, see how it's done in changelist 705954872.