Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Consider a case where host affinity is turned on once for a job, and the locality info is written to the coordinator stream. Then the user may turn off the host affinity feature.
That triggers a bug in ContainerAllocator:
1) it gets the locality map from JobModel which has the list of preferred hosts from the coordinator stream. Hence, ContainerAllocator is making preferred host resource requests.
2) At the end, ContainerAllocator finishes launching all containers and tries to release all extra containers mapping to ANY_HOST. However, all preferred host resource responses are kept under the specific host's entry. Hence, it failed to release those containers.
The end result is: the job is still successfully launched. However, YARN RM reports a lot of reserved memory/containers not released by the job. In some extreme cases, the reserved memory/container can be huge and affects the availability of the whole cluster.
Attachments
Issue Links
- links to