Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11718

Provide config option to not shutdown NM if it is decommissioned

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • resourcemanager
    • None

    Description

      Currently, an NM cannot be started if it is marked as decommissioned on the RM (in the exclude list) because RM sends a SHUTDOWN signal when NM tries to send a heartbeat after starting up:

      https://github.com/apache/hadoop/blob/1655acc5e2d5fe27e01f46ea02bd5a7dea44fe12/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java#L455-L465 

          // Check if this node is a 'valid' node
          if (!this.nodesListManager.isValidNode(host) &&
              !isNodeInDecommissioning(nodeId)) {
            String message =
                "Disallowed NodeManager from  " + host
                    + ", Sending SHUTDOWN signal to the NodeManager.";
            LOG.info(message);
            response.setDiagnosticsMessage(message);
            response.setNodeAction(NodeAction.SHUTDOWN);
            return response;
          } 

      This couples the start/stop operations of the NM service very tightly with its state in the RM making it difficult to manage large fleets of NMs independently from the RM.

      For example, after an NM OS upgrade, we will be able to start the NM, recommission it, and then check for the state without worrying about the order of the start/recommission operations (especially if we don't have control over the start operation - which is the case in large companies where start operation is part of the OS upgrade pipeline). This could also result in deployment failures on decommissioned nodes if the deployment pipeline checks for the running service before marking deploy as succeeded. 

      The patch will look something like this:

          // Check if this node is a 'valid' node
          if (!this.nodesListManager.isValidNode(host) &&
              !isNodeInDecommissioning(nodeId) &&
      +       !this.noNMShutdownForInvalidNodes) {
            String message =
                "Disallowed NodeManager from  " + host
                    + ", Sending SHUTDOWN signal to the NodeManager.";
            LOG.info(message);
            response.setDiagnosticsMessage(message);
            response.setNodeAction(NodeAction.SHUTDOWN);
            return response;
          } 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            aswinmprabhu Aswin M Prabhu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: