Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5139 [Umbrella] Move YARN scheduler towards global scheduler
  3. YARN-8546

Resource leak caused by a reserved container being released more than once under async scheduling

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      I was able to reproduce this issue by starting a job, and this job keeps requesting containers until it uses up cluster available resource. My cluster has 70200 vcores, and each task it applies for 100 vcores, I was expecting total 702 containers can be allocated but eventually there was only 701. The last container could not get allocated because queue used resource is updated to be more than 100%.

      Attachments

        1. YARN-8546.branch-2.10.001.patch
          9 kB
          Eric Payne
        2. YARN-8546.001.patch
          9 kB
          Tao Yang

        Activity

          People

            Tao Yang Tao Yang
            cheersyang Weiwei Yang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: