Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26630

EmbeddedHaServices is not made for recovery on a single instance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.15.0
    • None
    • Runtime / Coordination
    • None

    Description

      EmbeddedHaServices (and EmbeddedHaServicesWithLeadershipControl) provide leader election functionality to work on a single JVM. In FLINK-25235 we introduced the re-instantiation of HighAvailabilityServices per JobManager (i.e. DispatcherResourceManagerComponent) in TestingMiniCluster to be able to close the HighAvailabilityServices during the shutdown of a JM and not only at the end of the HA cluster to get closer to a production environment where each JM has its own HAServices instance as well (that became crucial as part of the work of FLINK-24038 which revokes the leadership when it closes the HAServices during a JM shutdown).

      The EmbeddedHaServices, though, provide a no-op StandaloneJobGraphStore implementation, i.e. no real recovery is testable with the TestingMiniCluster (even before the change of FLINK-25235). We should still fix that to enable users to use the TestingMiniCluster for such purposes. That means that we should provide a JobGraphStore and JobResultStore that's shared between the different HighAvailabilityServices instances and probably also the Checkpoint-related HA components.

      Right now, the multi-JM setup of the TestingMiniCluster is only used in ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange where it's bound to the ZooKeeperHAServices. Therefore, it's not a pressing issue for 1.15. But we should fix it as a follow-up.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: