Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.2.0, 2.3.0
-
None
-
None
-
SuSE 11 SP2 + Hadoop-2.3
Description
There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1.
Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now.
MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory).
Attachments
Attachments
Issue Links
- Is contained by
-
YARN-2848 (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit
- Resolved
- is duplicated by
-
MAPREDUCE-5928 Deadlock allocating containers for mappers and reducers
- Resolved
- is related to
-
MAPREDUCE-6302 Preempt reducers after a configurable timeout irrespective of headroom
- Closed