Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI is broken on master #3042

Closed
kolyshkin opened this issue Jun 25, 2021 · 8 comments · Fixed by #3043
Closed

CI is broken on master #3042

kolyshkin opened this issue Jun 25, 2021 · 8 comments · Fixed by #3043

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Jun 25, 2021

This is caused by a combination of #2902 and #3029 (and thus was not caught by CI beforehand).

The failures are

[kir@kir-rhat runc]$ sudo bats tests/integration/checkpoint.bats 
[sudo] password for kir: 
 ✓ checkpoint and restore
 ✗ checkpoint and restore (bind mount, destination is symlink)
   (from function `simple_cr' in file tests/integration/checkpoint.bats, line 122,
    in test file tests/integration/checkpoint.bats, line 141)
     `simple_cr' failed
   runc spec (status=0):
   
   runc run -d --console-socket /tmp/bats-run-3773071/runc.5vbkhQ/tty/sock test_busybox (status=0):
   
   runc state test_busybox (status=0):
   {
     "ociVersion": "1.0.2-dev",
     "id": "test_busybox",
     "pid": 3773432,
     "status": "running",
     "bundle": "/tmp/bats-run-3773071/runc.5vbkhQ/bundle",
     "rootfs": "/tmp/bats-run-3773071/runc.5vbkhQ/bundle/rootfs",
     "created": "2021-06-25T03:20:19.886414615Z",
     "owner": ""
   }
   runc --criu /usr/sbin/criu checkpoint --work-path ./work-dir test_busybox (status=0):
   
   runc state test_busybox (status=1):
   time="2021-06-24T20:20:20-07:00" level=error msg="container \"test_busybox\" does not exist"
   runc --criu /usr/sbin/criu restore -d --work-path ./work-dir --console-socket /tmp/bats-run-3773071/runc.5vbkhQ/tty/sock test_busybox (status=1):
   time="2021-06-24T20:20:20-07:00" level=error msg="criu failed: type NOTIFY errno 0\nlog file: work-dir/restore.log"
   (00.003907) mnt: 		Will mount 1546 @ /tmp/.criu.mntns.X6Ot07/13-0000000000/proc/bus
   (00.003909) mnt: 	Read 1546 mp @ /tmp/.criu.mntns.X6Ot07/13-0000000000/proc/bus
   (00.003915) mnt: 		Will mount 1545 from /0
   (00.003918) mnt: 		Will mount 1545 @ /tmp/.criu.mntns.X6Ot07/13-0000000000/dev/console
   (00.003920) mnt: 	Read 1545 mp @ /tmp/.criu.mntns.X6Ot07/13-0000000000/dev/console
   (00.003927) Error (criu/mount.c:2894): mnt: No mapping for /real/conf mountpoint
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/io.pressure': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/cgroup.procs': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.latency': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.pressure': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.procs': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.events.local': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.weight': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.max.descendants': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.stat': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.weight.nice': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.pressure': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/pids.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.stat': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/pids.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.low': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.pressure': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.type': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.bfq.weight': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.stat': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.events.local': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.high': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.rsvd.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.rsvd.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.threads': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.numa_stat': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.rsvd.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpuset.cpus.partition': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpuset.cpus.effective': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.freeze': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.min': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.controllers': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.events.local': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.oom.group': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpuset.mems': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.high': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/pids.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpuset.mems.effective': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.subtree_control': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.events': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.1GB.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.rsvd.current': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.weight': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpuset.cpus': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.max.depth': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.stat': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/hugetlb.2MB.max': Operation not permitted
   rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-kernel-config.mount/cgroup.events': Operation not permitted
(.....repeats ad infinitum.....)

 ✓ checkpoint and restore with nested bind mounts

10 tests, 1 failure, 1 skipped

rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/io.pressure': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/cgroup.procs': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.events': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.events': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.latency': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/io.pressure': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cgroup.procs': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.events.local': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.current': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/memory.swap.max': Operation not permitted
rm: cannot remove '/tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie/sys-fs-fuse-connections.mount/cpu.weight': Operation not permitted

The "Operation not permitted" is caused by these stale mounts:

[kir@kir-rhat runc]$ mount | grep bats-run
none on /tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO type tmpfs (rw,relatime,seclabel,inode64)
none on /tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie type cgroup2 (rw,relatime,seclabel)

none on /tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO type tmpfs (rw,relatime,seclabel,inode64)
none on /tmp/bats-run-3773071/runc.5vbkhQ/bundle/work-dir/.criu.cgyard.mRjLCO/unifie type cgroup2 (rw,relatime,seclabel)

@kolyshkin
Copy link
Contributor Author

@liusdu PTAL

@kolyshkin
Copy link
Contributor Author

# grep Error *.log
restore.log:(00.006210) Error (criu/mount.c:2894): mnt: No mapping for /real/conf mountpoint

There might be a bug in CRIU (it fails to unmount what it mounted on the error path), but it's not the only problem here.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Jun 25, 2021

It's too late here to do any real investigation, so I just created a revert PR #3043 .

But we need to investigate it. This may be a bug in runc plus a bug in criu, or something. I hope @liusdu will investigate further (I won't have time this week).

@kolyshkin
Copy link
Contributor Author

If this is something simple though, and we'll find the fix soon (say in a day or two), we can surely fix it on top rather than do a revert, but for now I can't think of anything.

@kolyshkin
Copy link
Contributor Author

I looked some more into it and I for now I could not figure out what is going on. This might or might not be related to the fact that we're having an unprecedented heat wave here in Pacific Northwest and there's no AC at home.

@liusdu
Copy link

liusdu commented Jun 28, 2021

@kolyshkin let me take a look at it~

@liusdu
Copy link

liusdu commented Jun 28, 2021

After a biesect search, I found commit 0ca91f4 introduced this broken. I will continue dig it~

@liusdu
Copy link

liusdu commented Jun 28, 2021

@kolyshkin @cyphar this broken behavior is due to the following changes from 0ca91f4.

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 945a0fa..849bf4a 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -1217,7 +1217,6 @@ func (c *linuxContainer) makeCriuRestoreMountpoints(m *configs.Mount) error {
                if err := checkProcMount(c.config.Rootfs, dest, ""); err != nil {
                        return err
                }
-               m.Destination = dest
                if err := os.MkdirAll(dest, 0755); err != nil {
                        return err
                }

So there are two ways to fix this issue:

  • Solution1: Revert the above changes:
diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index cbaa87e..50c8a3c 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -1218,6 +1218,7 @@ func (c *linuxContainer) makeCriuRestoreMountpoints(m *configs.Mount) error {
                if err := checkProcMount(c.config.Rootfs, dest, ""); err != nil {
                        return err
                }
+               m.Destination = dest
                if err := os.MkdirAll(dest, 0o755); err != nil {
                        return err
                }
  • Solution 2: Apply the folowing change:
diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 0680539..cbaa87e 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -780,6 +780,9 @@ const descriptorsFilename = "descriptors.json"
 
 func (c *linuxContainer) addCriuDumpMount(req *criurpc.CriuReq, m *configs.Mount) {
        mountDest := strings.TrimPrefix(m.Destination, c.config.Rootfs)
+       if dest, err := securejoin.SecureJoin(c.config.Rootfs, mountDest); err == nil {
+               mountDest = dest[len(c.config.Rootfs):]
+       }
        extMnt := &criurpc.ExtMountMap{
                Key: proto.String(mountDest),
                Val: proto.String(mountDest),
@@ -1156,6 +1159,9 @@ func (c *linuxContainer) Checkpoint(criuOpts *CriuOpts) error {
 
 func (c *linuxContainer) addCriuRestoreMount(req *criurpc.CriuReq, m *configs.Mount) {
        mountDest := strings.TrimPrefix(m.Destination, c.config.Rootfs)
+       if dest, err := securejoin.SecureJoin(c.config.Rootfs, mountDest); err == nil {
+               mountDest = dest[len(c.config.Rootfs):]
+       }
        extMnt := &criurpc.ExtMountMap{
                Key: proto.String(mountDest),
                Val: proto.String(m.Source),

Since this issue is closed, let me open a new pr to explain this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants