[FLINK-26388] Release Testing: Repeatable Cleanup (FLINK-25433) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.15.0
Fix Version/s: 1.15.0
Component/s: Runtime / Coordination
Labels:
- pull-request-available
- release-testing

Description

Repeatable cleanup got introduced with FLIP-194 but should be considered as an independent feature of the JobResultStore (JRS) from a user's point of view.

Repeatable cleanup can be triggered by running into an error while cleaning up. This can be achieved by disabling access to S3 after the job finished, e.g.:

Setting a reasonable enough checkpointing time (checkpointing should be enabled to allow cleanup of s3)
Disable s3 (removing permissions or shutting down the s3 server)
Stop job with savepoint

Stopping the job should work but the logs should show failure with repeating retries. Enabling S3 again should fix the issue.

Keep in mind that if testing this in with HA, you should use a different bucket for the file-based JRS artifacts only change permissions for the bucket that holds JRS-unrelated artifacts. Flink would fail fatally if the JRS is not able to access it's backend storage.

Documentation and configuration is still in the process of being updated in ~~FLINK-26296~~ and ~~FLINK-26331~~

Attachments

Issue Links

is caused by

FLINK-25433 Integrate retry strategy for cleanup stage

Closed

relates to

FLINK-26296 Add missing documentation

Resolved

FLINK-26331 Make max retries configurable

Resolved

Testing discovered

FLINK-26488 FileSystem.listFiles is not implemented consistently

Open

FLINK-26606 CompletedCheckpoints that failed to be discarded are not stored in the CompletedCheckpointStore

Open

FLINK-26450 FileStateHandle.discardState does not process return value

Resolved

FLINK-26484 FileSystem.delete is not implemented consistently

Resolved

FLINK-26494 Missing logs during retry

Resolved

links to

GitHub Pull Request #19058

(3 Testing discovered, 1 links to)

Activity

People

Assignee:: Dawid Wysakowicz

Reporter:: Matthias Pohl

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Feb/22 08:14

Updated:: 11/Mar/22 15:35

Resolved:: 11/Mar/22 15:17