Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-128 [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery
  3. YARN-1405

RM hangs on shutdown if calling system.exit in serviceInit or serviceStart

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • None
    • None
    • Reviewed

    Description

      Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP"

      if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error"

      Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown.

      Snapshot of Resource manager log:

      2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens
      2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
      java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist
      at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
      at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
      at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
      at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188)
      at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635)
      at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
      2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

      Attachments

        1. YARN-1405.1.patch
          5 kB
          Jian He
        2. rm-threaddump.out
          17 kB
          Jian He

        Activity

          People

            jianhe Jian He
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: