Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Sometimes, a single reason in the status update message makes it very hard for frameworks to understand the cause of a status update. For example, we have REASON_EXECUTOR_TERMINATED, but that's a very general reason and sometime we want a sub-reason for that (e.g., REASON_CONTAINER_LAUNCH_FAILED) so that the framework can better react to the status update.
We could change 'reason' field in TaskStatus to be a repeated field (should be backward compatible). For instance, for a containerizer launch failure, we probably need two reasons for TASK_LOST: 1) the top level reason REASON_EXECUTOR_TERMINATED; 2) the second level reason REASON_CONTAINER_LAUNCH_FAILED.
Another example. We may want to have a generic reason when resource limit is reached: REASON_RESOURCE_LIMIT_EXCEEDED, and have a second level sub-reason: REASON_OUT_OF_MEMORY.
Attachments
Issue Links
- is related to
-
MESOS-7963 Task groups can lose the container limitation status.
- Resolved
- relates to
-
MESOS-2035 Add reason to containerizer proto Termination
- Resolved