Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown / SIGTERM #6

Open
njsmith opened this issue Aug 20, 2019 · 9 comments · May be fixed by #41
Open

Graceful shutdown / SIGTERM #6

njsmith opened this issue Aug 20, 2019 · 9 comments · May be fixed by #41

Comments

@njsmith
Copy link
Member

njsmith commented Aug 20, 2019

When Heroku decides to restart our app (e.g. for an upgrade, or just automatically once a day), it sends SIGTERM and then waits 30 seconds for it to exit.

Right now, we don't have any SIGTERM handling at all, so

This creates some race conditions: for example, suppose that a PR is merged into this repo by a new contributor. This kicks off two processes:

  • snekomatic sees that it's a new contributor's first PR, and tries to invite them and post a comment
  • heroku sees that there's a new commit in snekomatic, and tries to reboot snekomatic to pick up the changes

In practice we kinda get away with it, because it takes heroku a few seconds to notice the new commit and then build a new container image, so snekomatic wins the race. But if snekomatic gets fancier and starts doing more, this will become more of a problem.

It's easy to have a task listen for SIGTERM. But after it gets it, what should it do? I guess we want to do a graceful shutdown of hypercorn (i.e. finish processing current requests, but stop accepting new ones). I don't think hypercorn has any support for this? And Trio doesn't make it particularly easy yet either (cf python-trio/trio#147).

Oh, no, I'm wrong: hypercorn has a config option shutdown_event that it uses for doing graceful shutdown when running in multi-process mode. Maybe we can use this when from API mode too? Is it just as simple as passing shutdown_event=some_trio_event_object as part of our config? CC @pgjones

@pgjones
Copy link

pgjones commented Aug 21, 2019

When Hypercorn receives a SIGTERM it should trigger a graceful shutdown, whereby it stops accepting connections and allows the remaining ones to complete. It then sends a lifespan.shutdown ASGI event and waits for the application to reply with a lifespan.shutdown.complete event which Quart converts the lifespan messages into shutdown (and startup) handling.

In summary you should be able to do,

@app.after_serving
async def shutdown():
    await stuff_to_finish 

Note as well that Quart-Trio introduces a nursery on the app which I hope allows you to create background tasks, and then await them in the after serving.

@app.route("/job")
async def trigger():
    app.nursery.start_soon(something)
    return 201

@app.after_serving
async def shutdown():
    await app.nursery  # I've forgotten the syntax 

@njsmith
Copy link
Member Author

njsmith commented Aug 21, 2019

@pgjones I'm using hypercorn.trio.serve

@pgjones
Copy link

pgjones commented Aug 22, 2019

Ah, I see.

I'm thinking the hypercorn.trio.serve function needs another argument of the type shutdown: Optional[Awaitable[None]] = None which would allow usage like,

trigger_shutdown = trio.Event()
nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait))
...
trigger_shutdown.set()  # Trigger graceful shutdown of serve

What do you think?

@njsmith
Copy link
Member Author

njsmith commented Aug 22, 2019

I think you mean Optional[Callable[[], Awaitable[None]]] or similar, but otherwise sure that would work.

@pgjones
Copy link

pgjones commented Aug 22, 2019

Sorry there was a typo, it should have been

nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait()))

So that you could pass any awaitable (rather than a callable). Do you think it should be a Callable?

@njsmith
Copy link
Member Author

njsmith commented Aug 22, 2019

Trio's normal convention is that we pretend awaitable objects don't exist, only async functions. The major advantages are that it simplifies teaching, and that we can easily use pylint/mypy to catch missing awaits. But that means we'd need to disable those checks to use something like shutdown=trigger_shutdown.wait().

tl;dr: yeah I think a callable would be better

@pgjones
Copy link

pgjones commented Aug 23, 2019

Sounds sensible, I've added in https://gitlab.com/pgjones/hypercorn/commit/33a6f1a3847382921ec3d37143145d1bf06ee263. I'll release soon, so please say if it doesn't do the job.

@njsmith
Copy link
Member Author

njsmith commented Aug 23, 2019

I haven't tried using it, but the diff looks fine :-)

Re: the commit message, just in case one of us is confused: AFAIK there's no problem with regular cancellation on trio, it's just not graceful :-)

@pgjones
Copy link

pgjones commented Aug 26, 2019

Sorry about the commit message, it isn't very clear (I can't quite remember the reasoning now either).

I've released 0.8.0 now, if you'd like to use the shutdown trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants