Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11733

Fix the order of updating CPU controls with cgroup v1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.5.0
    • yarn
    • Reviewed

    Description

      After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2 support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us controls have changed which can cause the below errors when launching containers with CPU limits on cgroupv1:

      PrintWriter unable to write to /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us with value: 112500

       

      Reproduction:

      I set CPU limits on yarn-site.xml for cgroup:

      yarn.nodemanager.resource.percentage-physical-cpu-limit: 90
      yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage: true

      After that the limits were applied on the hadoop-yarn root hierarchy:

      root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000
      root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000
      

      When I tried to launch a container it gave me the following error:

      PrintWriter unable to write to /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us with value: 112500

      It is because the container tries to exceed the limit defined at higher level with the 112 500 value for cfs_quota_us. If I try to create a test cgroup manually and try to update this control it lets me to do that up to the value of 90 000 as well:

      [root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us
      100000
      [root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us
      -bash: echo: write error: Invalid argument
      [root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us

       

      Solution:

      The cause for this issue is that the cfs_period_us control get the default value of 100 000 when a new cgroup is created, but when YARN calculates the limit, it uses 1 000 000 for that. Because of this we need to update cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the two values and not to overcome the limit defined at parent level.

      Attachments

        Issue Links

          Activity

            People

              pszucs Peter Szucs
              pszucs Peter Szucs
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: