Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This problem is introduced by YARN-4270 which add limitation on reservation.
In FSAppAttempt.reserve():
if (!reservationExceedsThreshold(node, type)) { LOG.info("Making reservation: node=" + node.getNodeName() + " app_id=" + getApplicationId()); if (!alreadyReserved) { getMetrics().reserveResource(getUser(), container.getResource()); RMContainer rmContainer = super.reserve(node, priority, null, container); node.reserveResource(this, priority, rmContainer); setReservation(node); } else { RMContainer rmContainer = node.getReservedContainer(); super.reserve(node, priority, rmContainer, container); node.reserveResource(this, priority, rmContainer); setReservation(node); } }
If reservation over threshod, current node will not set reservation.
But in attemptScheduling in FairSheduler:
while (node.getReservedContainer() == null) { boolean assignedContainer = false; if (!queueMgr.getRootQueue().assignContainer(node).equals( Resources.none())) { assignedContainers++; assignedContainer = true; } if (!assignedContainer) { break; } if (!assignMultiple) { break; } if ((assignedContainers >= maxAssign) && (maxAssign > 0)) { break; } }
assignContainer(node) still return FairScheduler.CONTAINER_RESERVED, which not
equals to Resources.none().
As a result, if multiple assign is enabled and maxAssign is unlimited, this while loop would never break.
I suppose that assignContainer(node) should return Resource.none rather than CONTAINER_RESERVED when the attempt doesn't take the reservation because of the limitation.
Attachments
Attachments
Issue Links
- relates to
-
YARN-4270 Limit application resource reservation on nodes for non-node/rack specific requests
- Resolved