Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
0.28.1
-
None
-
5
Description
RESERVE operation may cause a master failure:
I0619 05:02:02.298602 11194 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49617 with User-Agent='python-requests/2.9.1' I0619 05:02:02.305542 11193 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49618 with User-Agent='python-requests/2.9.1' I0619 05:02:02.306731 11191 master.cpp:6560] Sending checkpointed resources mem(kafkatest-role, kafkatest-principal, {resource_id: 7408cc53-183c-48c2-a07f-7087806219f3}):256; cpus(kafkatest-role, kafkatest-principal, {resource_id: d7888099-db8f-4018-9109-f70fb1174f53}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: b5dd90fc-2c12-4199-9fc4-cf9f918e332b}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: a0ee4e01-803f-4b71-950d-483caeb01a57}):[9305-9305, 11596-11596]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 8cd72abb-7089-4220-bb90-46b70c9953ab}):0.5; disk(kafkatest-role, kafkatest-principal, {resource_id: ed06ec6e-2d15-4d0e-bbc4-95a942e58596})[]:11204 to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5) I0619 05:02:02.311069 11189 http.cpp:312] HTTP POST for /master/destroy-volumes from 172.17.0.4:49619 with User-Agent='python-requests/2.9.1' I0619 05:02:02.312191 11187 master.cpp:6560] Sending checkpointed resources cpus(kafkatest-role, kafkatest-principal, {resource_id: f1ff4806-0c24-4d60-ad2b-b06462ee4081}):1.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cb8dc92d-64f0-4007-8520-1f63625b98c0}):2304; ports(kafkatest-role, kafkatest-principal, {resource_id: 225b4172-be77-453a-a94f-8845edc3f09a}):[9692-9692, 11824-11824]; cpus(kafkatest-role, kafkatest-principal, {resource_id: 942e102a-ca63-480d-9853-9a39e2695ec9}):0.5; mem(kafkatest-role, kafkatest-principal, {resource_id: cad57f8c-27f5-484c-a3fb-e80da74f0813}):256; disk(kafkatest-role, kafkatest-principal, {resource_id: e6563e09-e284-4aaf-8d53-72056695de41})[]:11204 to slave 489aa72f-ae07-4383-a56f-6fe9346ace37-S7 at slave(1)@10.0.0.7:5051 (10.0.0.7) I0619 05:02:02.316118 11189 http.cpp:312] HTTP GET for /master/slaves from 172.17.0.4:49620 with User-Agent='python-requests/2.9.1' I0619 05:02:02.321527 11189 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49621 with User-Agent='python-requests/2.9.1' I0619 05:02:02.323523 11193 master.cpp:6560] Sending checkpointed resources to slave a80ff9dd-e046-43ab-b763-28365b136f6b-S0 at slave(1)@10.0.0.5:5051 (10.0.0.5) I0619 05:02:02.327658 11191 http.cpp:312] HTTP POST for /master/unreserve from 172.17.0.4:49622 with User-Agent='python-requests/2.9.1' F0619 05:02:02.329208 11190 sorter.cpp:284] Check failed: total_.scalarQuantities.contains(oldSlaveQuantity)
Possible reasons:
- Recent improvements in allocator (b4d746f)
- Bug in bookkeeping during the previous UNRESERVE
- Network partition that happened after RESERVE and before UNRESERVE
Attachments
Issue Links
- duplicates
-
MESOS-5698 Quota sorter not updated for resource changes at agent.
- Resolved