Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.5.0
-
None
-
None
-
Flink 1.5.0 on YARN with the default execution mode.
Description
We started a job with 120 slots, using a FixedDelayRestart strategy with the delay of 1 minutes.
During recovery, some but not all Slots were released.
When the job restarts again, Flink requests a new batch of slots.
The total number of slots is now 193, larger than the configured amount, but the excess slots are never released.
This bug does not happen with legacy mode. I've attach the job manager log.
Attachments
Attachments
Issue Links
- duplicates
-
FLINK-9635 Local recovery scheduling can cause spread out of tasks
- Closed