Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17505

Merge small files produced by StreamingFileSink

    XMLWordPrintableJSON

Details

    Description

      This an alternative approach to FLINK-11499, to solve a problem of creating many small files with bulk formats in StreamingFileSink (which have to be rolled on checkpoint).

      Merge based approach would require converting StreamingFileSink from a sink, to an operator, that would be working exactly as it’s working right now, with the same limitations (no support for arbitrary rolling policies for bulk formats), followed by another operator that would be tasked with merging small files in the background.

      In the long term we probably would like to have both merge operator and write ahead log solution (WAL described in FLINK-11499) as alternatives, as WAL would behave better if small files are more common, and merge operator could behave better if small files are rare (because of data skew for example).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pnowojski Piotr Nowojski
              Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: