Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6087

Hudi streaming-read is not stopping with savepoint correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.1
    • None

    Description

      Flink supports stopping with savepoint as documented here:

      https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#stopping-a-job-with-savepoint

       

      Stopping with savepoint will invoke these 3 interface functions Flink functions.

      1. cancel()
      2. snapshotState()
      3. close()

       

      However, the current implementation of stopping with savepoint will cause an issuedInstant to be null when snapshotState is invoked. This is so as cancel() will set the issuedInstant to null, causing the snapshotState() to add a null value to the LinkState list.

       

      As such, when resuming from a savepoint, there will be:

      1. data loss if the LATEST (by not specifying a value for, i.e. default value) read.start.commit is used
      2. duplicated data if the EARLIEST read.start.commit is used

       

       

       

      Attachments

        Issue Links

          Activity

            People

              voonhous voon
              voonhous voon
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: