[YARN-1071] ResourceManager's decommissioned and lost node count is 0 after restart - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1.0-beta
Fix Version/s: 2.4.0
Component/s: resourcemanager
Labels:
None

Hadoop Flags:

Reviewed

Description

I had 6 nodes in a cluster with 2 NMs stopped. Then I put a host into YARN's yarn.resourcemanager.nodes.exclude-path. After running yarn rmadmin -refreshNodes, RM's JMX correctly showed decommissioned node count:

"NumActiveNMs" : 3,
"NumDecommissionedNMs" : 1,
"NumLostNMs" : 2,
"NumUnhealthyNMs" : 0,
"NumRebootedNMs" : 0

After restarting RM, the counts were shown as below in JMX.

"NumActiveNMs" : 3,
"NumDecommissionedNMs" : 0,
"NumLostNMs" : 0,
"NumUnhealthyNMs" : 0,
"NumRebootedNMs" : 0

Notice that the lost and decommissioned NM counts are both 0.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

YARN-1071.1.patch
19/Feb/14 02:01
10 kB
Jian He
YARN-1071.2.patch
19/Feb/14 18:10
12 kB
Jian He
YARN-1071.3.patch
19/Feb/14 18:19
12 kB
Jian He
YARN-1071.4.patch
20/Feb/14 22:47
12 kB
Jian He
YARN-1071.5.patch
20/Feb/14 23:36
12 kB
Jian He
YARN-1071.6.patch
21/Feb/14 02:08
13 kB
Jian He

Issue Links

relates to

YARN-2567 Add a percentage-node threshold for RM to wait for new allocations after restart/failover

Open

AMBARI-2940 After restarting YARN, the number of lost nodes is incorrect

Resolved

Activity

People

Assignee:: Jian He

Reporter:: Srimanth Gunturi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 16/Aug/13 17:24

Updated:: 18/Sep/14 18:14

Resolved:: 21/Feb/14 06:52