Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kourier does not gracefully shut down #1118

Closed
dermasmid opened this issue Sep 19, 2023 · 4 comments · Fixed by #1203
Closed

Kourier does not gracefully shut down #1118

dermasmid opened this issue Sep 19, 2023 · 4 comments · Fixed by #1203

Comments

@dermasmid
Copy link

when scaling down or updating the 3scale-kourier-gateway deployment, we see a spike in failed requests, it seems like envoy does not wait for in-flight requests to finish before exiting

@dermasmid
Copy link
Author

here's the shutdown logs:
ERROR 2023-09-19T22:19:14.415083930Z [resource.labels.containerName: kourier-gateway] [2023-09-19 22:19:14.414][1][info][main] [source/server/server.cc:951] exiting ERROR 2023-09-19T22:19:14.412906636Z [resource.labels.containerName: kourier-gateway] [2023-09-19 22:19:14.412][1][warning][config] [./source/common/config/grpc_stream.h:163] StreamAggregatedResources gRPC config stream to xds_cluster closed: 13, ERROR 2023-09-19T22:19:13.999309548Z [resource.labels.containerName: kourier-gateway] [2023-09-19 22:19:13.999][1][info][main] [source/server/server.cc:899] main dispatch loop exited ERROR 2023-09-19T22:19:13.999296536Z [resource.labels.containerName: kourier-gateway] [2023-09-19 22:19:13.999][1][info][main] [source/server/server.cc:964] shutting down server instance ERROR 2023-09-19T22:19:13.999249884Z [resource.labels.containerName: kourier-gateway] [2023-09-19 22:19:13.999][1][warning][main] [source/server/server.cc:833] caught ENVOY_SIGTERM

@skonto
Copy link
Contributor

skonto commented Dec 1, 2023

We probably need to consider something like in envoyproxy/envoy#19369.
There are a couple of options documented: https://www.envoyproxy.io/docs/envoy/latest/operations/admin#operations-admin-interface-drain for draining requests and istio uses something similar: https://github.com/istio/istio/blob/master/pkg/envoy/admin.go#L27-L28.

@norbjd
Copy link
Contributor

norbjd commented Jan 31, 2024

Hello @skonto 👋 is there work in progress on this specific issue? We've also noticed that problem. If not, what could be the next steps? I can help if required. Thanks!

@norbjd
Copy link
Contributor

norbjd commented Feb 1, 2024

Hello 👋 just so you know, I took a look at the issue and found some interesting stuff. Once the prerequisite #1200 is merged (to fix an existing bug), I'll create a PR to add the "drain" or "finish in-flight requests" logic before shutting down. If you're interested, I made a first hacky version here: main...norbjd:net-kourier:gateway-prestop-hook-wait-until-all-incoming-requests-are-finished (similar to what this comment says). I've tested on my side, and this works.

It seems that today, even the "drain" admin endpoint won't work as expected. But I'm not sure so before creating the definitive PR, I have to do some testing 🧑‍🔧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants