Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11703

Validate accessibility of Node Manager working directories

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.5.0
    • None
    • yarn
    • None
    • Reviewed

    Description

      Problem:

      If some subdirectory or file changes permission under yarn.nodemanager.local-dirs or yarn.nodemanager.log-dirs, and won't be accessible by the node manager, then the node manager will not reach an unhealthy state, but container runs would fail.

      Testing:

      • run an example PI job in a cluster
      • change the user cache directory of the user to not readable by the node manager. For example:
        chmod 222 ./usercache/{user}
        
      • cluster state will stay healthy
      • re-run the PI job
      • containers will fail on the affected node, with
      ... Not able to initialize app-cache directories in any of the configured local directories for user ...

      Solution:

      Add an extra validation to the DirectoryCollection#testdirs to ensure the content of the local-dirs and log-dirs are accessible by the node manager, and turn the node unhealthy if not.
      New flag will be introduced to enable this validation: yarn.nodemanager.working-dir-content-accessibility-validation.enabled (default true)

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            bkosztolnik Bence Kosztolnik
            bkosztolnik Bence Kosztolnik
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: