Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5517

HoodieTimeline support filter instants by state transition time

    XMLWordPrintableJSON

Details

    Description

      Hudi timeline can actually miss some instants if we incremental pulling from upstream hudi table, which is written by several writers.

      For example, say we have 2 writers writing data to the hudi table, and the last success incremental pulling end timestamp is 001

      w1 is writing 002, w2 is writing 003, if w2 is finished earlier than the w1, then the incremental pulling end timestamp will be updated to 003, and actually w1's commit: 002 will be skipped since it's instant time is earlier than the w2's.

      We actually needs to use commit end time(state transition time) to filter the commits if using incremental pulling. As w2's state transition time is earlier than the w1's, so w1's data won't be filtered.

      This relates to the HUDI-1623 but not adding end time to the end of each commit, instead use `FileStatus.getModificationTime` to represent the end time.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Bone An Hui An
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: