Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5180

Scheduler driver does not detect disconnection with master and reregister.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • 0.24.0
    • None
    • scheduler driver
    • 3

    Description

      The existing implementation of the scheduler driver does not re-register with the master under some network partition cases.

      When a scheduler registers with the master:
      1) master links to the framework
      2) framework links to the master

      It is possible for either of these links to break without the master changing. (Currently, the scheduler driver will only re-register if the master changes).

      If both links break or if just link (1) breaks, the master views the framework as inactive and disconnected. This means the framework will not receive any more events (such as offers) from the master until it re-registers. There is currently no way for the scheduler to detect a one-way link breakage.

      if link (2) breaks, it makes (almost) no difference to the scheduler. The scheduler usually uses the link to send messages to the master, but libprocess will create another socket if the persistent one is not available.

      To fix link breakages for (1+2) and (2), the scheduler driver should implement a `::exited` event handler for the master's pid and trigger a master (re-)detection upon a disconnection. This in turn should make the driver (re)-register with the master. The scheduler library already does this: https://github.com/apache/mesos/blob/master/src/scheduler/scheduler.cpp#L395

      See the related issue MESOS-5181 for link (1) breakage.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kaysoky Joseph Wu
              Vinod Kone Vinod Kone
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: