Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9583

Wrong number of TaskManagers' slots after recovery.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.5.0
    • None
    • Runtime / Coordination
    • None
    • Flink 1.5.0 on YARN with the default execution mode.

    Description

      We started a job with 120 slots, using a FixedDelayRestart strategy with the delay of 1 minutes.

      During recovery, some but not all Slots were released.

      When the job restarts again, Flink requests a new batch of slots.

      The total number of slots is now 193, larger than the configured amount, but the excess slots are never released.

       

      This bug does not happen with legacy mode. I've attach the job manager log.

       

      Attachments

        1. jm.log
          634 kB
          Truong Duc Kien

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kien_truong Truong Duc Kien
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: