Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1824

when "docker ps -a" returns 400+ lines enabling docker containerizer results in all executors dying

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.21.0
    • containerization
    • None

    Description

      To reproduce:

      1. run this one-liner on your slave to create 400 exited docker containers:
        for i in `seq 1 400`; do docker run busybox:latest echo "hello" ; done;
        
      2. Start mesos-slave with only mesos containerizer enabled
      3. Launch tasks that use an executor (which uses libmesos)
      4. Restart mesos-slave process with --containerizer=docker,mesos
      5. See mesos-slave fork "docker ps -a" and never return
      6. Note that this mesos-slave never reregisters with master
      7. Wait at least 10 minutes and see executors commit suicide, which kills all of the tasks on your system. From executor log:
        I0919 21:24:14.018127 21778 exec.cpp:379] Executor asked to shutdown
        I0919 21:24:14.018812 21771 exec.cpp:78] Scheduling shutdown of the executor
        I0919 21:24:14.020514 21778 exec.cpp:394] Executor::shutdown took 1.866382ms
        I0919 21:24:16.000500 21771 exec.cpp:525] Executor sending status update TASK_KILLED (UUID: bfd3969c-ad0a-455a-93fe-06c37bdee513) for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000
        I0919 21:24:16.030253 21772 exec.cpp:332] Ignoring status update acknowledgement bfd3969c-ad0a-455a-93fe-06c37bdee513 for task 1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 20140814-221057-1208029356-5050-10525-0000 because the driver is aborted!
        I0919 21:24:19.021966 21778 exec.cpp:86] Committing suicide by killing the process group
        
      8. mesos-slave fails to tell the master about tasking be killed with this message in the log:
      W0918 01:02:57.252231 11725 status_update_manager.cpp:381] Not
      forwarding status update TASK_KILLED (UUID:
      6fbacbcf-ad0f-4e89-89ee-e9f88a618573) for task
      1410298578043-some-task-30-29279377-fdf2-4bb7-b862-852adddea09c
      of framework 20140522-213145-1749004561-5050-29512-0000 because no
      master is elected yet
      

      Attachments

        Activity

          People

            tnachen Timothy Chen
            jaybuff Jay Buffington
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: