Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide what to do about a single exception raised out of a nursery #2301

Closed
oremanj opened this issue Apr 21, 2022 · 8 comments
Closed

Decide what to do about a single exception raised out of a nursery #2301

oremanj opened this issue Apr 21, 2022 · 8 comments

Comments

@oremanj
Copy link
Member

oremanj commented Apr 21, 2022

In current Trio, if one task in a nursery raises an exception, then that exception directly propagates out of the nursery. If multiple tasks in a nursery raise exceptions, then they are wrapped into a single MultiError at the nursery boundary.

This is error-prone because it means except FooError: will catch a FooError across a nursery boundary, but only if there was just a single exception raised. As soon as another error appears in tandem, except FooError: won't catch anything, because MultiError doesn't inherit from FooError. This can be especially surprising if the other error is Cancelled:

with trio.move_on_after(1):
    try:
        async with trio.open_nursery() as nursery:
            nursery.start_soon(trio.sleep_forever)
            try:
                await trio.sleep_forever()
            finally:
                raise trio.BrokenResourceError("oh no, issue during finalization")
    except trio.BrokenResourceError:
        print("handler doesn't execute!")

It has been previously proposed (#611) that we should fix this by always wrapping exceptions in a MultiError/ExceptionGroup at a nursery boundary, even if there is only one exception being raised. That way, using except FooError: will always fail to catch the thing, instead of failing only when multiple exceptions occur simultaneously, which is theoretically easier to debug and understand. This is certainly the way Trio would have been designed if we had been able to predict this issue several years ago.

The problem: by Hyrum's Law, there is probably a lot of existing Trio code that does rely on the ability to write except FooError: and have that catch FooErrors even if they have crossed a nursery boundary since they were raised. This becomes especially tricky when you consider that library code might hide a nursery in an innocuous-seeming async context manager -- so it's hard to even audit for places where such . Maybe such code is technically broken now... but in practice it almost always works, so it's a bit hard to swallow the idea that we should turn it into code that never works. And since there are not really any hooks in the exception-catching mechanism, it would be basically impossible to provide a deprecation warning.

Let's call the desired new behavior (where nurseries always add a layer of ExceptionGroup wrapping, whether there's one exception or multiple) "strict exception semantics", versus the current "loose exception semantics".

Once we switch to ExceptionGroups (#2213), users on Python 3.11 will have an easy way out of this quandary: they can write except* FooError instead of except FooError (and deal with the fact that the actual error they're catching is now an ExceptionGroup[FooError], not a direct FooError). This is forward-compatible: it works with the current loose semantics, and will also work once we switch to strict semantics. Unfortunately, users not on 3.11 yet (which will include most libraries for a while) must convert their exception handlers to functions plus a with exceptiongroups.catch(): context manager, which is significantly clunkier (and is probably the best we can do with the available language features).

This issue is intended to collect potential solutions to the above-described quandary.


My basic proposal is as follows:

  • Merge the ExceptionGroups change backwards-compatibly; do whatever we do about single-exception wrapping under separate cover.
  • Document the issue, encouraging people to use the new except* syntax on 3.11 and later.
  • Provide a trio.run() keyword argument (strict_exception_groups=True?) to opt into strict behavior (which will become the future default), and an open_nursery() keyword argument to override the default for that particular nursery. (Not sure whether the nursery one should nest or not. Probably yes?)
  • Wait until 3.11 is in widespread use.
  • Make targeted changes to encourage libraries to support strict exception semantics. I'm not sure what exactly this would be; perhaps "default strict_exception_groups to True when running under pytest" or something like that.
  • Wait some more time.
  • Make the default be strict_exception_groups=True when running on 3.11+.

I'm not sure we ever need to remove support for strict_exception_groups=False. In particular, it is probably useful indefinitely on an individual-nursery basis, in order to support libraries that "hide" a nursery and expect their background tasks to never raise errors. Users should probably be able to write

try:
    async with open_websocket(...) as ws:
        raise RuntimeError
except RuntimeError:
    print("and have this clause execute")

We might instead want to add a nursery mode more targeted at such cases, that passes through exceptions from the nested child but wraps any background-task exception in some new error like trio.HelperTaskCrashed. We could use this for the system nursery too, instead of the current norm of using trio.TrioInternalError when a system task crashes. This could profitably be combined with the "service nurseries" concept discussed in #1521.

Some optional mix-ins for the above:

  • Default to strict exception semantics on 3.11+ immediately. Assume any libraries affected by this will notice the problem when they test for 3.11 compatibility in the lead-up to the release. Benefits: new users (which are more likely to be using new Python versions, and are definitely not going to know to toggle this obscure trio.run argument) will get the "better" semantics sooner; existing deployments will notice the issue as part of their general "does our thing still work with the latest Python version?" testing. Drawbacks: libraries might be left scrambling.
  • Default to strict exception semantics everywhere rather than limiting it to 3.11+. Benefits: it's a little confusing when a Python upgrade breaks something, because it's not Python's fault and users might not think to check Trio release notes if they didn't also upgrade Trio. Drawbacks: the necessary workarounds to support strict semantics before 3.11 are cumbersome and are likely to annoy our users.
  • Add a deprecation period where users who don't explicitly specify strict_exception_groups=True get a deprecation warning explaining the issue. Benefits: user education. Drawbacks: requires two code changes at different times for users who don't want to see the warning.

I don't feel strongly about what default we choose for strict vs loose semantics and when we change it, but I do feel strongly that users should be able to override whatever we choose.

@belm0
Copy link
Member

belm0 commented Apr 22, 2022

Since the ergonomics of concurrent exceptions were so awful, we quickly learned to avoid raising exceptions (those expected to be handled) from async API's. In the few cases where we did dabble with exceptions, we employed context managers like defer_to_cancelled() and multi_error_defer_to() to get back to a single exception in flight.

But I don't fully understand yet where we're exposed (e.g. catching BrokenResourceError from trio-websocket or other networking library?), nor all the implications when I'm wearing my library maintainer hat.

@agronholm
Copy link
Contributor

But I don't fully understand yet where we're exposed (e.g. catching BrokenResourceError from trio-websocket or other networking library?), nor all the implications when I'm wearing my library maintainer hat.

Do you have code that attempts to catch exceptions falling out of a nursery?

@Zac-HD
Copy link
Member

Zac-HD commented Nov 3, 2022

Closed by #2213 🎉

@Zac-HD Zac-HD closed this as completed Nov 3, 2022
@njsmith
Copy link
Member

njsmith commented Nov 3, 2022

I thought #2213 only implements the max-backwards-compatibility mode where we don't wrap single exceptions? This issue is for how we get to the long-term semantics where all nursery exceptions are wrapped in ExceptionGroup.

@njsmith njsmith reopened this Nov 3, 2022
@Zac-HD
Copy link
Member

Zac-HD commented Nov 3, 2022

Nope, since #2213 (comment) we have https://trio.readthedocs.io/en/stable/reference-core.html#strict-versus-loose-exceptiongroup-semantics

I've enabled the strict semantics globally and it's been fantastic; should be able to share the experience report in a few days 😊

@njsmith
Copy link
Member

njsmith commented Nov 3, 2022 via email

@Zac-HD
Copy link
Member

Zac-HD commented Nov 3, 2022

I don't think we need a deprecation message; just a two-step approach of first changing the default to strict_exception_groups=True and later removing the option. A logical time for the latter would be shortly after 3.10 goes end-of-life and except* can therefore be assumed, which suggests that the midpoint (late 2024) as a reasonable time to change the default. The docs suggest that it'll change "once Python 3.11 and later versions are in wide use" so we can be a bit flexible.

@Zac-HD
Copy link
Member

Zac-HD commented May 18, 2024

Re-closing this following the recent completion of #2785.

@Zac-HD Zac-HD closed this as completed May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants