Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Mesosphere Sprint 73
-
2
Description
The way we get resource statistics for Docker tasks is through getting the cgroup subsystem path through /proc/<pid>/cgroup first (taking the cpuacct subsystem as an example):
9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
Then read /sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat to get the statistics:
user 4 system 0
However, when a Docker container is being teared down, it seems that Docker or the operation system will first move the process to the root cgroup before actually killing it, making /proc/<pid>/docker look like the following:
9:cpuacct,cpu:/
This makes a racy call to cgroup::internal::cgroup() return a single '/', which in turn makes DockerContainerizerProcess::cgroupsStatistics() read /sys/fs/cgroup/cpuacct///cpuacct.stat, which contains the statistics for the root cgroup:
user 228058750 system 24506461
This can be reproduced by test.cpp with the following command:
$ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep ... Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat' user 4 system 0 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Reading file '/proc/44224/cgroup' Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat' user 228058750 system 24506461 Failed to open file '/proc/44224/cgroup' sleep [2]- Exit 1 ./test $(docker inspect sleep | jq .[].State.Pid)