Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9456

Let ResourceManager notify JobManager about failed/killed TaskManagers

    XMLWordPrintableJSON

Details

    Description

      Often, the ResourceManager learns faster about TaskManager failures/killings because it directly communicates with the underlying resource management framework. Instead of only relying on the JobManager's heartbeat to figure out that a TaskManager has died, we should additionally send a signal from the ResourceManager to the JobManager if a TaskManager has died. That way, we can react faster to TaskManager failures and recover our running job/s.

      Attachments

        Issue Links

          Activity

            People

              sihuazhou Sihua Zhou
              trohrmann Till Rohrmann
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: