Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5650

UNRESERVE operation causes master to crash.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • 0.28.1
    • None
    • allocation
    • 5

    Description

      RESERVE operation may cause a master failure:

      I0619 05:02:02.298602 11194 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49617 with User-Agent='python-requests/2.9.1'
      I0619 05:02:02.305542 11193 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49618 with User-Agent='python-requests/2.9.1'
      I0619 05:02:02.306731 11191 master.cpp:6560] Sending checkpointed resources mem(kafkatest-role, kafkatest-principal, {resource_id: 7408cc53-183c-48c2-a07f-7087806219f3}):256; cpus(kafkatest-role, kafkatest-principal, {resource_id: d7888099-db8f-4018-9109-f70fb1174f53}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: b5dd90fc-2c12-4199-9fc4-cf9f918e332b}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: a0ee4e01-803f-4b71-950d-483caeb01a57}):[9305-9305, 11596-11596]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 8cd72abb-7089-4220-bb90-46b70c9953ab}):0.5; disk(kafkatest-role, kafkatest-principal, {resource_id: ed06ec6e-2d15-4d0e-bbc4-95a942e58596})[]:11204 to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5)
      I0619 05:02:02.311069 11189 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49619 with User-Agent='python-requests/2.9.1'
      I0619 05:02:02.312191 11187 master.cpp:6560] Sending checkpointed resources cpus(kafkatest-role, kafkatest-principal, {resource_id: f1ff4806-0c24-4d60-ad2b-b06462ee4081}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cb8dc92d-64f0-4007-8520-1f63625b98c0}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: 225b4172-be77-453a-a94f-8845edc3f09a}):[9692-9692, 11824-11824]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 942e102a-ca63-480d-9853-9a39e2695ec9}):0.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cad57f8c-27f5-484c-a3fb-e80da74f0813}):256; disk(kafkatest-role, kafkatest-principal, {resource_id: e6563e09-e284-4aaf-8d53-72056695de41})[]:11204 to slave 489aa72f-ae07-4383-a56f-6fe9346ace37-S7 at slave(1)@10.0.0.7:5051 (10.0.0.7)
      I0619 05:02:02.316118 11189 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49620 with User-Agent='python-requests/2.9.1'
      I0619 05:02:02.321527 11189 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49621 with User-Agent='python-requests/2.9.1'
      I0619 05:02:02.323523 11193 master.cpp:6560] Sending checkpointed resources  to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5)
      I0619 05:02:02.327658 11191 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49622 with User-Agent='python-requests/2.9.1'
      F0619 05:02:02.329208 11190 sorter.cpp:284] Check failed: total_.scalarQuantities.contains(oldSlaveQuantity)
      

      Possible reasons:

      • Recent improvements in allocator (b4d746f)
      • Bug in bookkeeping during the previous UNRESERVE
      • Network partition that happened after RESERVE and before UNRESERVE

      Attachments

        Issue Links

          Activity

            People

              neilc Neil Conway
              alexr Alex R
              Alex R Alex R
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: