Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-23426

Support changelog processing in batch mode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Table SQL / API
    • None

    Description

      The DataStream API can execute arbitrary DataStream programs when running in batch mode. However, this is not the case for the Table API batch mode. E.g. a source with non-insert only changes is not supported and updates/deletes cannot be emitted.

      In theory, we could make this work by running the "stream mode" of the planner (CDC transformations) on top of the "batch mode" of DataStream API (specialized state backend, sorted inputs). It is up for discussion if and how we expose such functionality.

      If we don't allow enabling incremental updates, we can also add a special batch operator that materializes the incoming changes for a batch pipeline. However, it would require "complete" CDC logs (i.e. no missing UPDATE_AFTER).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              twalthr Timo Walther
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: