Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.4.2, 1.5.1, 1.6.1, 1.7.0
Description
Launch nested container call might fail with the following error:
Failed to enter mount namespace: Failed to open '/proc/29473/ns/mnt': No such file or directory
This happens when the containerizer launcher tries to enter `mnt` namespace using the pid of a terminated process. The pid was detected by the agent before spawning the containerizer launcher process, because the process was running back then.
The issue can be reproduced using the following test (pseudocode):
launchTask("sleep 1000") parentContainerId = containerizer.containers().begin() outputs = [] for i in range(10): ContainerId containerId containerId.parent = parentContainerId containerId.id = UUID.random() LAUNCH_NESTED_CONTAINER_SESSION(containerId, "echo echo") response = ATTACH_CONTAINER_OUTPUT(containerId) outputs.append(response.reader) for output in outputs: stdout, stderr = getProcessIOData(output) assert("echo" == stdout + stderr)
When we start the very first nested container, `getMountNamespaceTarget()` returns a PID of the task (`sleep 1000`), because it's the only process whose `mnt` namespace differs from the parent container. This nested container becomes a child of PID 1 process, which is also a parent of the command executor. It's not an executor's child! It can be seen in attached `pstree.png`.
When we start a second nested container, `getMountNamespaceTarget()` might return PID of the previous nested container (`echo echo`) instead of the task's PID (`sleep 1000`). It happens because the first nested container entered `mnt` namespace of the task. Then, the containerizer launcher ("nanny" process) attempts to enter `mnt` namespace using the PID of a terminated process, so we get this error.