Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.15.0
-
None
-
None
Description
EmbeddedHaServices (and EmbeddedHaServicesWithLeadershipControl) provide leader election functionality to work on a single JVM. In FLINK-25235 we introduced the re-instantiation of HighAvailabilityServices per JobManager (i.e. DispatcherResourceManagerComponent) in TestingMiniCluster to be able to close the HighAvailabilityServices during the shutdown of a JM and not only at the end of the HA cluster to get closer to a production environment where each JM has its own HAServices instance as well (that became crucial as part of the work of FLINK-24038 which revokes the leadership when it closes the HAServices during a JM shutdown).
The EmbeddedHaServices, though, provide a no-op StandaloneJobGraphStore implementation, i.e. no real recovery is testable with the TestingMiniCluster (even before the change of FLINK-25235). We should still fix that to enable users to use the TestingMiniCluster for such purposes. That means that we should provide a JobGraphStore and JobResultStore that's shared between the different HighAvailabilityServices instances and probably also the Checkpoint-related HA components.
Right now, the multi-JM setup of the TestingMiniCluster is only used in ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange where it's bound to the ZooKeeperHAServices. Therefore, it's not a pressing issue for 1.15. But we should fix it as a follow-up.
Attachments
Issue Links
- is caused by
-
FLINK-24038 DispatcherResourceManagerComponent fails to deregister application if no leading ResourceManager
- Closed
-
FLINK-25235 Re-enable ZooKeeperLeaderElectionITCase#testJobExecutionOnClusterWithLeaderChange
- Resolved
- is related to
-
FLINK-26502 Multiple component leader election has different close/stop behavior
- Closed
-
FLINK-26556 Refactoring MiniCluster and TestingMiniCluster
- Open
- relates to
-
FLINK-31816 Refactor EmbeddedLeaderElectionService
- Open