[YARN-1680] availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.2.0, 2.3.0
Fix Version/s: None
Component/s: capacityscheduler
Labels:
None
Environment:

SuSE 11 SP2 + Hadoop-2.3

Description

There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1.

Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now.

MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-1680-WIP.patch
01/Oct/14 06:38
7 kB
Chen He
YARN-1680-v2.patch
08/Jul/14 16:13
20 kB
Chen He
YARN-1680-v2.patch
27/May/14 16:41
20 kB
Chen He
YARN-1680.patch
20/May/14 16:35
18 kB
Chen He

Issue Links

Is contained by

YARN-2848 (FICA) Applications should maintain an application specific 'cluster' resource to calculate headroom and userlimit

Resolved

is duplicated by

MAPREDUCE-5928 Deadlock allocating containers for mappers and reducers

Resolved

is related to

MAPREDUCE-6302 Preempt reducers after a configurable timeout irrespective of headroom

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Rohith Sharma K S

Votes:: 1 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 24/Jan/14 08:21

Updated:: 07/Jan/17 01:52