Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
3.2.0
-
None
Description
We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used.
Reproduce this problem:
(1) use DominantResourceCalculator
(2) cluster resource has empty resource type, for example: gpu=0
(3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit).
Reference codes in RegularContainerAllocator#assignContainer:
// How much need to unreserve equals to: // max(required - headroom, amountNeedUnreserve) Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom()); Resource resourceNeedToUnReserve = Resources.max(rc, clusterResource, Resources.subtract(capability, headRoom), currentResoureLimits.getAmountNeededUnreserve()); boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none());
For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when headRoom=<0GB, 8 vcores, 0 gpu> and capacity=<8GB, 2 vcores, 0 gpu>, needToUnreserve which is the result of Resources#greaterThan will be false. This is not reasonable because required resource did exceed the headroom and unreserve is needed.
After that, when reaching the unreserve process in RegularContainerAllocator#assignContainer, unreserve process will be skipped when shouldAllocOrReserveNewContainer is true (when required containers > reserved containers) and needToUnreserve is wrongly calculated to be false:
if (availableContainers > 0) { if (rmContainer == null && reservationsContinueLooking && node.getLabels().isEmpty()) { // unreserve process can be wrongly skipped when shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required resource did exceed the headroom if (!shouldAllocOrReserveNewContainer || needToUnreserve) { ... } } }