Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.5.0, 1.5.1, 1.5.2, 1.6.0, 1.6.1, 1.7.0, 1.7.1
-
None
-
None
-
Mesos Foundations RI10 Sp 39
-
5
Description
MESOS-8782 and MESOS-8783 transition operations to OPERATION_GONE_BY_OPERATOR or OPERATION_UNREACHABLE when their agents are marked as gone or unreachable respectively. However, there are other cases where agents can be "removed" and forgot by the master:
1) When an agent tries to register with a new ID from the same IP:
https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L6836-L6849
2) When an agent requests to unregister:
https://github.com/apache/mesos/blob/f130544bdb8a9849096ee2cb35ebcbc7d8a326d8/src/master/master.cpp#L7817-L7840
In these tasks, the master explicitly sends TASK_LOST for task status updates (this also means that this documentation is wrong), but does nothing for operations. We should design proper operation status transitions for these cases.
Attachments
Issue Links
- is blocked by
-
MESOS-9556 Establish a well-defined agent state diagram
- Accepted
- is related to
-
MESOS-9546 Operation status is not updated in master when agent is marked as unreachable or gone
- Resolved