Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0
-
None
-
Mesosphere Sprint 37
-
3
Description
Currently, the scheduler library src/scheduler/scheduler.cpp does have an artificially induced delay when trying to initially establish a connection with the master. In the event of a master failover or ZK disconnect, a large number of frameworks can get disconnected and then thereby overwhelm the master with TCP SYN requests.
On a large cluster with many agents, the master is already overwhelmed with handling connection requests from the agents. This compounds the issue further on the master.
Attachments
Issue Links
- is related to
-
MESOS-5330 Agent should backoff before connecting to the master
- Resolved