Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
0.25
Description
Problem:
we may find some data which should be rollbacked in hudi table.
Root cause:
Let's first recall how rollback plan generated about log blocks for deltaCommit. Hudi takes two cases into consideration.
- For some log file with no base file, they are comprised by records which are all 'insert record'. Delete them directly. Here we assume all inserted record should be covered by this way.
- For those fileID which are updated according to inflight commit meta of instant we want to rollback, we append command block to these log file to rollback. Here all updated record are handled.
However, the first condition is not always true. For indexes which can index log file, they could insert record to some existing log file. In current process, inflight hoodieCommitMeta was generated before they are assigned to specific filegroup.
Fix:
What's needed to fix this problem, we need to use the result of partitioner to generate hoodieCommitMeta rather than workProfile. Also, we may need more comments in rollback code to remind this case.