[FLINK-17808] Rename checkpoint meta file to "_metadata" until it has completed writing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.10.0
Fix Version/s: 1.15.0
Component/s: Runtime / Checkpointing
Labels:
- pull-request-available

Description

In practice, some developers or customers would use some strategy to find the recent _metadata as the checkpoint to recover (e.g as many proposals in FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean the checkpoint have been completed as the writing to create the "_meatadata" file could break as some force quit (e.g. yarn application -kill).

We could create the checkpoint meta stream to write data to file named as "_metadata.inprogress" and renamed it to "_metadata" once completed writing. By doing so, we could ensure the "_metadata" is not broken.

Attachments

Issue Links

is related to

FLINK-9043 Introduce a friendly way to resume the job from externalized checkpoints automatically

Reopened

FLINK-9325 generate the _meta file for checkpoint only when the writing is truly successful

Closed

FLINK-22008 writing metadata is not an atomic operation, we should add a commit logic

Closed

supercedes

FLINK-22008 writing metadata is not an atomic operation, we should add a commit logic

Closed

links to

GitHub Pull Request #18157

Activity

People

Assignee:: Junfan Zhang

Reporter:: Yun Tang

Votes:: 2 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 19/May/20 08:03

Updated:: 28/Mar/22 08:38

Resolved:: 24/Jan/22 13:49