Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Today whenever RM receives heartbeat from NM it computes new heartbeat response and sends this response back to NM. Internally this response is sent to RMNodeImpl as an RMNodeEvent via dispatcher queue. Now if for some reason NM didn't get the older heartbeat then NM will try to heartbeat again..RM in turn will compute another response (if it has not already handled the event from queue) and will add this duplicate response on dispatcher queue. Today while computing response we remove completed applications from RMNodeImpl. Now if NM gets response without finished applications then it will never realize that those applications finished.
Solution:-
- We should synchronously update the newly computed response.
- lastResponse should be moved out of RMNodeImpl and it should be stored in ResourceTrackerService itself just like ApplicationMasterService.
- like
YARN-744we should introduce locking while computing response.
Attachments
Issue Links
- relates to
-
YARN-245 Node Manager can not handle duplicate responses
- Open