Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Hudi timeline can actually miss some instants if we incremental pulling from upstream hudi table, which is written by several writers.
For example, say we have 2 writers writing data to the hudi table, and the last success incremental pulling end timestamp is 001
w1 is writing 002, w2 is writing 003, if w2 is finished earlier than the w1, then the incremental pulling end timestamp will be updated to 003, and actually w1's commit: 002 will be skipped since it's instant time is earlier than the w2's.
We actually needs to use commit end time(state transition time) to filter the commits if using incremental pulling. As w2's state transition time is earlier than the w1's, so w1's data won't be filtered.
This relates to the HUDI-1623 but not adding end time to the end of each commit, instead use `FileStatus.getModificationTime` to represent the end time.