Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
To reproduce:
- run this one-liner on your slave to create 400 exited docker containers:
for i in `seq 1 400`; do docker run busybox:latest echo "hello" ; done;
- Start mesos-slave with only mesos containerizer enabled
- Launch tasks that use an executor (which uses libmesos)
- Restart mesos-slave process with --containerizer=docker,mesos
- See mesos-slave fork "docker ps -a" and never return
- Note that this mesos-slave never reregisters with master
- Wait at least 10 minutes and see executors commit suicide, which kills all of the tasks on your system. From executor log:
I0919 21:24:14.018127 21778 exec.cpp:379] Executor asked to shutdown I0919 21:24:14.018812 21771 exec.cpp:78] Scheduling shutdown of the executor I0919 21:24:14.020514 21778 exec.cpp:394] Executor::shutdown took 1.866382ms I0919 21:24:16.000500 21771 exec.cpp:525] Executor sending status update TASK_KILLED (UUID: bfd3969c-ad0a-455a-93fe-06c37bdee513) for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000 I0919 21:24:16.030253 21772 exec.cpp:332] Ignoring status update acknowledgement bfd3969c-ad0a-455a-93fe-06c37bdee513 for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000 because the driver is aborted! I0919 21:24:19.021966 21778 exec.cpp:86] Committing suicide by killing the process group
- mesos-slave fails to tell the master about tasking be killed with this message in the log:
W0918 01:02:57.252231 11725 status_update_manager.cpp:381] Not forwarding status update TASK_KILLED (UUID: 6fbacbcf-ad0f-4e89-89ee-e9f88a618573) for task 1410298578043-some-task-30-29279377-fdf2-4bb7-b862-852adddea09c of framework 20140522-213145-1749004561-5050-29512-0000 because no master is elected yet