Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
Description
https://github.com/apache/hudi/pull/7568
The above PR fixes the case where the archival of a clustering replacecommit can lead to duplicate data when both the replaced and new file groups from the replacecommit co-exist in the Hudi table.
The new logic is complex. We need to simplify the archival process.