Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-958

NM may miss a heartbeat response from RM resulting into missed finished applications information.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Today whenever RM receives heartbeat from NM it computes new heartbeat response and sends this response back to NM. Internally this response is sent to RMNodeImpl as an RMNodeEvent via dispatcher queue. Now if for some reason NM didn't get the older heartbeat then NM will try to heartbeat again..RM in turn will compute another response (if it has not already handled the event from queue) and will add this duplicate response on dispatcher queue. Today while computing response we remove completed applications from RMNodeImpl. Now if NM gets response without finished applications then it will never realize that those applications finished.

      Solution:-

      • We should synchronously update the newly computed response.
      • lastResponse should be moved out of RMNodeImpl and it should be stored in ResourceTrackerService itself just like ApplicationMasterService.
      • like YARN-744 we should introduce locking while computing response.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ojoshi Omkar Vinit Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: