Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
Reviewed
Description
After YARN-11674 (Update CpuResourceHandler implementation for cgroup v2 support) the order of updating cpu.cfs_period_us and cpu.cfs_quota_us controls have changed which can cause the below errors when launching containers with CPU limits on cgroupv1:
PrintWriter unable to write to /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us with value: 112500
Reproduction:
I set CPU limits on yarn-site.xml for cgroup:
yarn.nodemanager.resource.percentage-physical-cpu-limit: 90
yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage: true
After that the limits were applied on the hadoop-yarn root hierarchy:
root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_period_us 1000000 root@pszucs-test-2 hadoop-yarn]# cat cpu.cfs_quota_us 900000
When I tried to launch a container it gave me the following error:
PrintWriter unable to write to /var/cgroupv1/cpu/hadoop-yarn/container_e02_1727079571170_0040_02_000001/cpu.cfs_quota_us with value: 112500
It is because the container tries to exceed the limit defined at higher level with the 112 500 value for cfs_quota_us. If I try to create a test cgroup manually and try to update this control it lets me to do that up to the value of 90 000 as well:
[root@pszucs-test-2 hadoop-yarn]# cat test/cpu.cfs_period_us 100000 [root@pszucs-test-2 hadoop-yarn]# echo "90001" > test/cpu.cfs_quota_us -bash: echo: write error: Invalid argument [root@pszucs-test-2 hadoop-yarn]# echo "90000" > test/cpu.cfs_quota_us
Solution:
The cause for this issue is that the cfs_period_us control get the default value of 100 000 when a new cgroup is created, but when YARN calculates the limit, it uses 1 000 000 for that. Because of this we need to update cpu.cfs_period_us before cpu.cfs_quota_us, to keep the ratio between the two values and not to overcome the limit defined at parent level.
Attachments
Issue Links
- Blocked
-
YARN-11674 Update CpuResourceHandler implementation for cgroup v2 support
- Resolved
- links to