Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It should be possible to get a traceback out of a cancel scope #285

Open
njsmith opened this issue Aug 14, 2017 · 9 comments
Open

It should be possible to get a traceback out of a cancel scope #285

njsmith opened this issue Aug 14, 2017 · 9 comments

Comments

@njsmith
Copy link
Member

njsmith commented Aug 14, 2017

With fail_after, it's a bit annoying that it can't show you where the block was cancelled. (The traceback just starts at the end of the scope.)

This will be a problem in particular for test timeouts. (Well, they probably won't use fail_after, but they'll have the same problem.)

I don't think we can capture the exceptions in cancel scope objects by default, because we don't want to pin the stacks in memory.

We could have a flag that tells it to save caught exceptions, though. Or try to do something clever with catching them ourselves inside the scope, but that's basically the same thing from the user point of view, and the cancel scope already knows how to reliably catch its own exceptions.

@nmalaguti
Copy link
Contributor

What about converting the Cancelled exception caught by the cancel scope into a traceback.TracebackException which is meant to have enough information to print the stack but ensure no references are held. This could be stored on the scope and exposed for use by fail_after to set the __cause__ on the TooSlowError. It isn’t a real exception, but it should work if someone prints or logs the TooSlowError.

This would also help users that want to debug where their cancellation occurred on a given cancel scope by allowing them to get some printable info from the scope.

@nmalaguti
Copy link
Contributor

Never mind. The __cause__ must derive from BaseException.

So what are the risks of storing the Cancelled error on the scope object? Make it an option on open_cancel_scope to save the error and make it available? Default true or false? Make it the users problem if they keep their scopes around with references that hold memory and don’t set keep_exception to false?

@njsmith
Copy link
Member Author

njsmith commented Nov 8, 2017 via email

@cjerdonek
Copy link
Contributor

What about providing a hook for the user to transform the exception how they'd like, and with a sensible default (e.g. saving None or the default formatting of the traceback as a string)? This would also let the user store the exception as is if they wanted.

@njsmith
Copy link
Member Author

njsmith commented Nov 8, 2017

Do you have a use case?

@cjerdonek
Copy link
Contributor

I'm thinking of the times I ran into the same or a similar issue with asyncio. By default, asyncio doesn't seem to provide access to the full stack trace when a CancelledError or TimeoutError occurs. Stack traces just "stop" at the point where the asyncio machinery reraises it. This also came up for me in test runs, making troubleshooting more difficult, etc. What I wound up doing in one class of cases was wrapping the coroutine and logging the stack trace before it is bubbled up / handled by asyncio's machinery. But I can see cases where you might want to do something a little more than simple logging. For example, you might want to grab particular attributes off the exception in cases where you're interested in structured data, but maybe without having to store everything, to avoid possible issues of the sort you mention.

Disclaimer: I'm not using trio currently, but I'd like to at some point in the future when it's more mature. After wrestling with a number of the asyncio issues that trio was designed to solve, trio seems like the right way to go. (So thanks for all your work in developing a better way! It will be huge I hope!)

@njsmith
Copy link
Member Author

njsmith commented Nov 15, 2017

@cjerdonek Trio does provide full stack traces in general AFAIK – it even has some horrible code to reconstruct more accurate tracebacks in certain cases. If you run into cases where it doesn't then please file a bug :-).

The issue here is more specific. In Trio, cancellation is done using "cancel scopes": you wrap a chunk of code in a with block, and then you can cancel specifically that block. The way this is implemented is that when the block is cancelled, then we throw in a special Cancelled exception, which unwinds the stack back to the with block, which catches the exception. Then your program can continue on, doing whatever it's going to do. Sometimes, though, the reason that you're cancelling something is that you've decided to give up and raise an exception. (Perhaps because you've gotten tired of waiting.) In that case, the first thing you do after the cancel block finishes unwinding is raise another exception. This is what the fail_after convenience function does. But right now, there's no connection between the Cancelled exception that's used to unwind the stack, and the real exception we raise afterwards. It would be nice if in this particular case, we could stick that Canceled exception on as the __context__ for the TooSlowError we raise afterwards, because maybe sometimes it'll give you a useful hint about what operation got stuck.

In this specific situation, we know ahead of time that we want to do this, and we actually want the Cancelled object itself, not any kind of preprocessed replacement (__context__ has to be a real exception object). So a flag asking the cancel scope to hold onto the Cancelled object seems sufficient.

@njsmith
Copy link
Member Author

njsmith commented Nov 15, 2017

I thought of another complication: there's some subtle interaction between MultiError and this feature. Specifically, there are two fairly common cases to think about:

  • The cancel scope gets a MultiError containing multiple Cancelled exceptions, and wants to catch them all. (This is the usual thing that happens if you cancel a block that contains a nursery with multiple running tasks in it.)

  • The cancel scope gets a MultiError containing a mix of Cancelled exceptions and other things, and wants to catch (some of) the Cancelled exceptions and let the rest propagate.

Right now, MultiError.catch maps a handler over all the exceptions inside a MultiError, and lets you examine each one before deciding whether to catch it or not. Then it discards all the ones that were caught, and if any are left it re-raises a new exception built out of them. (With some cleverness to keep tracebacks correct etc.)

So the simple way to do this would be to have our handler check the flag, and if it's set, then for each primitive Cancelled exception it catches, it stashes it in some set or something. But the problem with this is that you lose the tree structure and traceback that the original MultiError had -- you just get a flat list of exceptions with part of their tracebacks missing. Not really something that you want to attach as a __context__. In the second case above, this is perhaps the best you can manage... I guess we could have some mechanism for MultiError.catch to also extract the caught exceptions and reconstruct them into a proper MultiError?

Another option, a bit hackier but maybe simpler if it works, would be to say oh well, we actually only care about capturing these exceptions when we're in the first case above. So what we could do is have an explicit check where we save the original MultiError, run the MultiError.catch as usual, and then if it swallows everything we know that the original MultiError consisted solely of our Cancelled exceptions and we just save that instead. But this does lead to some weird inconsistencies – we set the "did we catch an exception?" flag based on whether we caught at least one Cancelled (i.e., it's true in both of the cases above), but if we did things this way then we'd only actually have an exception to point to as being the one we caught in the first case, not the second.

@oremanj
Copy link
Member

oremanj commented Apr 17, 2018

Funnily enough, it appears that implicit exception chaining is actually doing what we want here; there's some discussion at https://gitter.im/python-trio/general?at=5ad5753a27c509a7741cca44.

In [4]: async def afn():
   ...:     try:
   ...:         with trio.fail_after(1):
   ...:             await trio.sleep(5)
   ...:     except trio.TooSlowError as exc:
   ...:         return exc
   ...:

In [5]: exc = trio.run(afn)

In [6]: exc
Out[6]: trio.TooSlowError()

In [7]: exc.__context__
Out[7]: trio.Cancelled()

Simpler illustrative example:

In [8]: class suppress:
   ...:     def __enter__(self):
   ...:         pass
   ...:     def __exit__(self, *exc_info):
   ...:         return True
   ...:

In [9]: from contextlib import contextmanager

In [10]: @contextmanager
    ...: def chain():
    ...:     with suppress():
    ...:         yield
    ...:     raise ValueError
    ...:

In [11]: with chain():
    ...:     raise KeyError
    ...:
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-bf8693b31045> in chain()
      3     with suppress():
----> 4         yield
      5     raise ValueError

<ipython-input-11-6f32b66051a3> in <module>()
      1 with chain():
----> 2     raise KeyError

KeyError:

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-11-6f32b66051a3> in <module>()
      1 with chain():
----> 2     raise KeyError

/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     97                 value = type()
     98             try:
---> 99                 self.gen.throw(type, value, traceback)
    100             except StopIteration as exc:
    101                 # Suppress StopIteration *unless* it's the same exception that

<ipython-input-10-bf8693b31045> in chain()
      3     with suppress():
      4         yield
----> 5     raise ValueError

ValueError:

This is pretty subtle, though, so probably we still want the explicit chaining or a comment in fail_after. It's also unlikely that the implicit chaining will do the right thing with MultiErrors; my guess would be that the entire MultiError gets attached rather than just the Cancelled part, but I haven't verified this.

oremanj added a commit to oremanj/trio that referenced this issue Feb 6, 2019
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others.

I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing.

An overview:
* New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy."
* Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child.
* Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC).
* `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s.
* Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task.
* `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition.

Potential advantages:
* Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point.
* Nursery.start() is much simpler.
* Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic).
* Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach.
* Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding.

Caveats:
* We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so.
* There are probably bugs, because aren't there always?

Current cancel logic:
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        incoming_deadline = inf if self._shield else parent_effective_deadline
        my_deadline = -inf if self._cancel_called else self._deadline
        return min(incoming_deadline, my_deadline), parent_extra_info
```
Want to support a grace period? I'm pretty sure it would work with something like
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline)
        if self._shield:
            parent_effective_deadline = parent_cleanup_deadline = inf
        my_cleanup_start = min(self._deadline, self._cancel_called_at)
        merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period)
        my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline)
        if self._shield_during_cleanup:
            effective_deadline = merged_cleanup_deadline
        else:
            effective_deadline = min(parent_effective_deadline, my_cleanup_start)
        return effective_deadline, my_extra_info
```
Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants