Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Swarm Services with IPv6 #30

Open
mtelvers opened this issue Mar 3, 2023 · 9 comments
Open

Docker Swarm Services with IPv6 #30

mtelvers opened this issue Mar 3, 2023 · 9 comments
Assignees

Comments

@mtelvers
Copy link
Collaborator

mtelvers commented Mar 3, 2023

IPv6 does not work as expected with docker service deploy.

Using this trivial docker-compose.yml as an example.

version: "3.9"

services:

  caddy:
    image: caddy
    ports:
      - "80:80"
    volumes:
      - /etc/caddy:/etc/caddy:ro

  nginx:
    image: nginx

/etc/caddy/Caddyfile contains a reverse proxy definition like this:

:80 {
        reverse_proxy nginx:80
}

Basic connectivity can be checked using using docker compose up and using curl http://<ipv4>:80 and curl http://<ipv6>:80, which both return the nginx default website.

After docker swarm init, the stack can be deployed using docker stack deploy --compose-file ./docker-compose.yml test. Connectivity tests on IPv6 now fail. IPv4 works as expected.

Updating /etc/docker/daemon.json by adding ipv6 and fixed-cidr-v6 as described here does not help.

The situation can be worked around by deploying the caddy image with host ports like this. Now, IPv4 and IPv6 connectivity work.

version: "3.9"

services:

  caddy:
    image: caddy
    deploy:
      mode: global
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
    volumes:
      - /etc/caddy:/etc/caddy:ro

  nginx:
    image: nginx
@avsm
Copy link
Member

avsm commented Mar 5, 2023

After consulting with the local Docker maintainer @djs55, we've come to the conclusion that there's no easy workaround for the Swarm issues without a more serious dive into the codebase (and ipvs support in the Linux kernel).

However, we only use Swarm in single-node mode, for the purposes of docker service and docker stack. We could switch to simply switch to docker compose instead, as we don't really use any of the fancy features of docker stack at all (such as the autoscaling), and have a much simpler operational configuration for the hosts. @talex5 might have an opinion here.

@talex5
Copy link

talex5 commented Mar 5, 2023

If you only have one host then the current solution of mode: host should work fine. If you have multiple hosts, then docker compose won't help anyway.

The main benefits of docker service (last time I checked) were that it starts the services automatically on reboot and allows secrets management. But I'm not sure what compose does these days - I remember they were trying to unify the two systems.

@avsm
Copy link
Member

avsm commented Mar 5, 2023

I think that compose supports the restarting of services on reboot (or rather, the underlying engine just seems to restart them). But secrets does need some investigation...

If compose does work for secrets, it's altogether just a simpler deployment since the swarm layer seems increasingly out of fashion these days.

@talex5
Copy link

talex5 commented Mar 6, 2023

If compose does work for secrets, it's altogether just a simpler deployment since the swarm layer seems increasingly out of fashion these days.

If we don't need the fancy multi-host networking (which had many problems last time I checked, not just IPv6) then an even simpler setup would be to remove Docker completely and instead have OCurrent push a NixOS configuration for the service to the host and then do nixos-rebuild switch. Systemd would handle upgrading without dropping connections. The main disadvantage is that most security features are opt-in in systemd (e.g. https://www.redhat.com/sysadmin/systemd-secure-services).

@avsm
Copy link
Member

avsm commented Mar 6, 2023

I'd prefer to do things one step at a time: simplifying the existing Docker setup, and subsequently evaluating a potential switch to NixOS.

@mtelvers
Copy link
Collaborator Author

mtelvers commented Mar 6, 2023

We also update the images via OCurrent Deployer using docker service update --image, which only works for a service, not a Docker Compose. As we only have a single host deployment, isn't the current workaround good enough? It has all the features we need, with the only downside being that we need to remember to add it to the stack.

@mtelvers
Copy link
Collaborator Author

mtelvers commented Mar 6, 2023

Similar issues can be seen on GitHub: moby/moby#24379, moby/moby#24847, moby/moby#43643,

@avsm
Copy link
Member

avsm commented Mar 7, 2023

I worry a little about the complexity of the stack on a single host. I'm ok with this workaround in the short term, but we do need to think about the best way to replace it in the longer term. There seems to be a tension between:

  • deploying existing service stacks (like PeerTube) which just come with their own stack/compose setups, and just most easily done with Docker.
  • going "Linux-native" with systemd directly, possibly with NixOS to aid in system configuration, for zero-downtime upgrades.
  • integrating in MirageOS pieces, like DNS and Let's encrypt certificates for OCaml #27 for DNS, which generally do benefit from more direct Linux-style deployment.

So far, balancing the pieces in the medium term point me towards picking a simple base image (like Alpine) and adding ocurrent support for maintaining packages on the host (reproducibly). NixOS+Mirage brings its own multiplicity of painful tool interactions that I'm not convinced we want want to jump into just yet :-)

@tmcgilchrist
Copy link
Collaborator

tmcgilchrist commented Mar 7, 2023

@avsm Is there no plan to support IPv6 with docker services? It's a worrying omission that looks like it is not getting attention looking at the links @mtelvers shared.
Regardless we have an immediate problem of supporting IPv6 for certain systems (watch.ocaml.org and ocaml.org), which @mtelvers has a reasonable short-term fix for.

I am worried about the complexity of introducing Nix into our infrastructure, everywhere I've used it for ops it has been a huge maintenance burden and has required dedicated Nix experts to keep it running. Given the limited resource we have for looking after infra and that neither Mark or I have experience with it, we should avoid it.

we do need to think about the best way to replace it in the longer term

Agree there needs to be a plan for how to manage this, we are in a very reactive mode with all these infrastructure issues.

@avsm avsm self-assigned this May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants