Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel scopes with disjoint extents #886

Open
njsmith opened this issue Jan 28, 2019 · 3 comments
Open

Cancel scopes with disjoint extents #886

njsmith opened this issue Jan 28, 2019 · 3 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 28, 2019

So! Linked cancel scopes! This is an idea that was first raised in #607 (comment), and also discussed in #835. The latter also has some notes on implementation subtleties that I won't copy here, but are worth reviewing if/when this turns into a PR again.

Here's the problem: If you want to have a single CancelScope that applies to multiple disjoint tasks, then how do you do that? Even with unbound cancel scopes (#835), you can't, because a CancelScope tracks information about both what's been requested, and what actually happened to a specific block of code (.cancelled_caught, or maybe .cancel_succeeded, see #885). If the same scope object could be used for multiple pieces of code simultaneously, then this wouldn't work.

One idea is to let a cancel scope fork off "linked children": new CancelScope objects that are their own independent entity, but that whenever the parent is cancelled this automatically cancels the children as well.

(Do we have any example use cases? I feel like we did, but I don't remember them off the top of my head.)

There's a lot about this that I like, but there are also a lot of potential complexities to think through:

  • Introspection: should there be a way to get the parent scope? The child scopes? The descendent scopes? The initial version of Add support for unbound cancel scopes #835 had a .linked_children attribute that returned all descendents.

  • current_effective_deadline: I guess this would need to walk up the parent chain for each cancel scope to collect up all their deadlines? (Which means we need to store parent links internally.)

  • How do shield attributes and linked scopes interact? In the initial version of Add support for unbound cancel scopes #835, changes to parent.shield propagated to child.shield, but this felt very weird to me. Conceptually, the shield component of a cancel scope is pretty much disjoint from the cancellation component. Maybe this is more evidence for the idea raised in idea: unbound cancel scopes #607, that shielding should be split off into a separate object from cancel scopes?

  • If we add a "soft" cancellation concept (More ergonomic support for graceful shutdown #147), then how will that interact with this?

  • How tightly coupled should the parent and the children be? The two possibilities that come to mind are: (a) the idea described above, where the children are totally independent, full-fledged CancelScope objects, except that parent.cancel() triggers a call to child.cancel(). (b) making them "linked clones", where the parent's deadline is the child's deadline, mutating one deadline mutates them all, cancelling any of them cancels them all, etc. The only thing that would be independent is .cancelled_caught, and maybe in the future some kind of data about what exceptions were caught.

    I guess it won't make a big difference if the way you actually use them in practice is always one of these two stereotyped patterns:

    with cancel_scope_from_somewhere_else.linked_child():
        ...
    
    with cancel_scope_from_somewhere_else.linked_child() as local_cancel_scope:
        ...
    if local_cancel_scope.cancelled_caught:
        ...

.....ugh now that I write that I'm wondering if we should revisit the discussion in #607 about whether cancelled_caught should really be part of the cancel scope itself, versus something separate that you get from __enter__. The main argument against splitting them is that it would make a mess of helpers like move_on_after and fail_after, where if you wanted to expose both the scope object (to allow e.g. adjusting deadlines) and the cancelled_caught object (very important for move_on_after), then having them on two separate objects is annoying, and also compatibility breaking unless we introduce yet a third object, and then Occam starts jumping up and down and waving his arms.

BUT. Here's something we should at least consider:

CancelScope is now becoming a full-fledged class that regular users will interact with. And we could trivially give its constructor a timeout= argument. Supposedly move_on_at and move_on_after are convenience shorthands, but at that point they're not adding a lot of convenience. And fail_at/fail_after are of dubious value to start with, + if you're using them you certainly don't care about the .cancelled_caught attribute.

So here is a possibility we should at least consider:

# Replacement for move_on_after
with CancelScope(timeout=timeout).enter() as was_cancelled:
    ...
if was_cancelled:  # an object with a custom __bool__ method
    ...

# Replacement for move_on_at
with CancelScope(deadline=deadline).enter() as was_cancelled:
    ...

But the key idea here is that the CancelScope object doesn't expose state related to this particular with block, so you could also write cancel_scope = CancelScope(...) and then enter it as many times as you wanted, where-ever you wanted.

fail_after/fail_at could stay like they are now, or maybe become .enter(raise_on_cancel=True)?

I guess was_cancelled.__bool__ would raise an exception if you called it from inside the associated with block?

(While we're at it: consider globally renaming deadlineat, timeoutafter everywhere.)

And hmm, I was thinking that we needed the .enter() there so that we could match up the calls to __enter__ and __exit__, so we knew which was_cancelled object to update. But actually it would be trivial to stash this information on the task, e.g. in the cancel stack. So we could drop the .enter().

If we do decide to provide an option to capture the Cancelled exceptions for later introspection, then that would need something like with cancel_scope.saving_exceptions() as was_cancelled: ..., or with cancel_scope.using(save_exceptions=True) as was_cancelled: ..., but that seems fine.

I don't know if this is a good idea. It's certainly a bigger change than I'd like to be making at this point, now that people are starting to build real projects on top of Trio. But... well, I thought it, so I'm writing it down, and we'll see what you all think (where "you" includes "me, but tomorrow").

@njsmith
Copy link
Member Author

njsmith commented Jan 28, 2019

Hmm, here's a terrible idea: we could make the .cancelled_caught attribute implicitly task-local, so that when you access it, it tells you about what happened the last time this scope was used in this task.

...Yeah, that's a terrible idea, let's not do that. Way too magic.

@oremanj
Copy link
Member

oremanj commented Jan 29, 2019

current_effective_deadline: I guess this would need to walk up the parent chain for each cancel scope to collect up all their deadlines? (Which means we need to store parent links internally.)

An idea I had post #835 was to have cancel scopes calculate a "merged deadline" = min(my deadline, my linked parent's merged deadline), and use that for effective deadline calculations. Then the linked parent doesn't have to be in the loop for cancellations that occur on a deadline, and we don't have to make an exception to the "cancel scopes in Runner.deadlines always have some tasks in them" rule.

How do shield attributes and linked scopes interact? In the initial version of #835, changes to parent.shield propagated to child.shield, but this felt very weird to me. Conceptually, the shield component of a cancel scope is pretty much disjoint from the cancellation component. Maybe this is more evidence for the idea raised in #607, that shielding should be split off into a separate object from cancel scopes?

I did this because at the time I was doing a lot of experiments with cancellation that involved toggling the shield status of multiple scopes at once, and linked children made for a convenient way to manage that. I don't think Trio itself needs to support that; it's easy to bolt on in external code.

One other thought: an especial complexity of the linked children implementation in #835 was its need to carefully batch together all tasks that would be notified on any top-level operation, so that if two nested scopes were both cancelled, the outer one would win. I might take a stab at factoring this out into a with trio.hazmat.batch_cancellations(): context manager, which I think could let us get rid of _cancel_no_notify() and simplify TaskStatus.started() in the process.

whether cancelled_caught should really be part of the cancel scope itself, versus something separate that you get from __enter__.

If we make with CancelScope(...) as was_cancelled: work, then it's cumbersome to make with CancelScope(...) as scope: work, and the latter is the idiom you want for cancel-on-request. In general if the object returned by CancelScope.__enter__() doesn't expose all of CancelScope's attributes, there will be some use cases it makes more cumbersome; and if it does expose all of CancelScope's attributes, I think the distinction between the two might be a point of user confusion. So I'm more on the side of representing disjoint cancellation using explicit linkages between CancelScopes, but I'm willing to be convinced.

Hmm, here's a terrible idea: we could make the .cancelled_caught attribute implicitly task-local, so that when you access it, it tells you about what happened the last time this scope was used in this task.

I actually think that would be a good solution, just because I can't imagine a situation where someone would want to introspect the cancelled_caught attribute of a scope they didn't just exit. But I suspect I'm also more tolerant of excessive magic than you are. :P

oremanj added a commit to oremanj/trio that referenced this issue Feb 3, 2019
This generalizes the "cancel multiple scopes without waking up the tasks, then wake up all the tasks at once"
pattern that was previously a special case for handling cancellations on a deadline. It will simplify the
implementation of python-trio#886, and exposing it as part of hazmat lets users write their own cancellation
abstractions that work the way the native ones do.
oremanj added a commit to oremanj/trio that referenced this issue Feb 3, 2019
This generalizes the "cancel multiple scopes without waking up the tasks, then wake up all the tasks at once"
pattern that was previously a special case for handling cancellations on a deadline. It will simplify the
implementation of python-trio#886, and exposing it as part of hazmat lets users write their own cancellation
abstractions that work the way the native ones do.
oremanj added a commit to oremanj/trio that referenced this issue Feb 3, 2019
This generalizes the "cancel multiple scopes without waking up the tasks, then wake up all the tasks at once"
pattern that was previously a special case for handling cancellations on a deadline. It will simplify the
implementation of python-trio#886, and exposing it as part of hazmat lets users write their own cancellation
abstractions that work the way the native ones do.
oremanj added a commit to oremanj/trio that referenced this issue Feb 6, 2019
Relevant to python-trio#886, python-trio#606, python-trio#285, python-trio#147, python-trio#70, python-trio#58, maybe others.

I was continuing my effort to shoehorn linked cancel scopes and graceful cancellation into `CancelScope` earlier today and it was feeling too much of a mess, so I decided to explore other options. This PR is the result. It makes major changes to Trio's cancellation internals, but barely any to Trio's cancellation semantics -- all tests pass except for one that is especially persnickety about `cancel_called`. No new tests or docs yet as I wanted to get feedback on the approach before polishing.

An overview:
* New class `CancelBinding` manages a single lexical context (a `with` block or a task) that might get a different cancellation treatment than its surroundings. "All plumbing, no policy."
* Each cancel binding has an effective deadline, a _single_ task, and links to parent and child bindings. Each parent lexically encloses its children. The only cancel bindings with multiple children are the ones immediately surrounding nurseries, and they have one child binding per nursery child task plus maybe one in the nested child.
* Each cancel binding calculates its effective deadline based on its parent's effective deadline and some additional data. The actual calculation is performed by an associated `CancelLogic` instance (a small ABC).
* `CancelScope` now implements `CancelLogic`, providing the deadline/shield semantics we know and love. It manages potentially-multiple `CancelBinding`s.
* Cancel stacks are gone. Instead, each task has an "active" (innermost) cancel binding, which changes as the task moves in and out of cancellation regions. The active cancel binding's effective deadline directly determines whether and when `Cancelled` is raised in the task.
* `Runner.deadlines` stores tasks instead of cancel scopes. There is no longer a meaningful state of "deadline is in the past but scope isn't cancelled yet" (this is what the sole failing test doesn't like). If the effective deadline of a task's active cancel binding is non-infinite and in the future, it goes in Runner.deadlines. If it's in the past, the task has a pending cancellation by definition.

Potential advantages:
* Cancellation becomes extensible without changes to _core, via users writing their own CancelLogic and wrapping a core CancelBinding(s) around it. We could even move CancelScope out of _core if we want to make a point.
* Nursery.start() is much simpler.
* Splitting shielding into a separate object from cancellation becomes trivial (they'd be two kinds of CancelLogic).
* Most operations that are performed frequently take constant time: checking whether you're cancelled, checking what your deadline is, entering and leaving a cancel binding. I haven't benchmarked, so it's possible we're losing on constant factors or something, but in theory this should be faster than the old approach.
* Since tasks now have well-defined root cancel bindings, I think python-trio#606 becomes straightforward via providing a way to spawn a system task whose cancel binding is a child of something other than the system nursery's cancel binding.

Caveats:
* We call `current_time()` a lot. Not sure if this is worth worrying about, and could probably be cached if so.
* There are probably bugs, because aren't there always?

Current cancel logic:
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        incoming_deadline = inf if self._shield else parent_effective_deadline
        my_deadline = -inf if self._cancel_called else self._deadline
        return min(incoming_deadline, my_deadline), parent_extra_info
```
Want to support a grace period? I'm pretty sure it would work with something like
```
    def compute_effective_deadline(
        self, parent_effective_deadline, parent_extra_info, task
    ):
        parent_cleanup_deadline = parent_extra_info.get("effective_cleanup_deadline", parent_effective_deadline)
        if self._shield:
            parent_effective_deadline = parent_cleanup_deadline = inf
        my_cleanup_start = min(self._deadline, self._cancel_called_at)
        merged_cleanup_deadline = min(parent_cleanup_deadline, my_cleanup_start + self._grace_period)
        my_extra_info = parent_extra_info.set("effective_cleanup_deadline", merged_cleanup_deadline)
        if self._shield_during_cleanup:
            effective_deadline = merged_cleanup_deadline
        else:
            effective_deadline = min(parent_effective_deadline, my_cleanup_start)
        return effective_deadline, my_extra_info
```
Maybe that's not quite _simple_ but it is miles better than what I was looking at before. :-)
@njsmith
Copy link
Member Author

njsmith commented Feb 6, 2019

I actually think that would be a good solution, just because I can't imagine a situation where someone would want to introspect the cancelled_caught attribute of a scope they didn't just exit.

Well.... ok, you got me thinking about it more :-). But this made me realize another possible problem: I can't think how to make this work in a way that doesn't potentially leak memory. Maybe endowing each Task with a WeakKeyDict[CancelScope, bool]? But even that could get into trouble if you have both long-lived tasks and long-lived cancel scopes that get mixed-and-matched in complicated ways over time...

If we make with CancelScope(...) as was_cancelled: work, then it's cumbersome to make with CancelScope(...) as scope: work, and the latter is the idiom you want for cancel-on-request

Well, with 3.8 we can do with (scope := CancelScope(...)) as was_cancelled:... okay maybe that's not so great looking :-). More realistic would be:

scope = CancelScope(...)
with scope as was_cancelled:
    ...

That's annoying if you really do write it that way, but it seems unusual that we'd want to create a scope → immediately enter it → then mutate it from inside the block. I can think of cases where you want to create a scope in one place and then enter it somewhere else, but that wouldn't be a problem here. And I can think of cases where you want to set some kind of deadline, so you do with CancelScope(deadline=...), but there you don't need the scope object. And I can think of cases where you want to manually cancel the surrounding scope, but really that only makes sense if you're cancelling background tasks, which means you're cancelling a nursery, which means you have nursery.cancel_scope.

Oh, but that is a problem: where do nurseries keep the .was_cancelled object, if it's not an attribute on nursery.cancel_scope? I guess it would have to be nursery.was_cancelled? That's kinda awkward...


Stepping back: this is pretty complicated. Probably we shouldn't commit to a solution for disjoint extents right this moment. Maybe we'll even decide we don't need this feature to be built-in at all. The urgent question is: are we OK with releasing the current unbound CancelScopes that we have in master, or are they making API commitments that we'll later regret?

We've always had cancel scope objects with .cancelled_caught attributes on them, so we may or may decide to change that later but either way it's not a new commitment. One new commitment is the return value from CancelScope.__enter__. We could potentially avoid returning anything from CancelScope.__enter__ for now, in order to keep our options open. I guess that might be pretty informative even? if people find it annoying, then that's evidence that my musing above is wrong and people really do want to create a cancel scope and immediately get access to it :-). OTOH maybe anyone who finds it annoying will just use move_on_after and friends, instead of complaining.

If we do keep returning self from CancelScope.__enter__, how badly does that constrain us? Depends on what we want to do in the future.

Option 1: maybe we'll never add built-in support for "linked" cancel scopes. You can get a similar effect pretty easily in Trio right now: just make a class that hands out CancelScopes, keeps a (weak) list of all the ones its handed out, and pushes cancel/deadline updates to them on demand. Maybe that's good enough. Certainly it's good enough for to get us through this month, and probably next month too :-).

Option 2: if we do commit to keeping CancelScope the same, but then want to add something like the with scope as was_cancelled, how close can we get? Thinking about it more, I realized that if we rethink a bit, we can actually get pretty close. Imagine we introduce a CancelSource object, that has .deadline and .cancel() but no .was_cancelled. And then we let people write with cancel_source as cancel_scope, where this cancel_scope exposes the underlying .deadline and .cancel (or could make them readonly or something), and adds .was_cancelled. We could even define cancel_scope.__bool__ = return self.was_cancelled, if we really wanted to. I don't know if we want to do any of this, but it's reassuring to me that we do have the option.

However! This did make me realize: one of the cute ideas for was_cancelled is that it should raise an error if you try to access it before the scope has ended. (This is particularly helpful because if someone is confused and thinks that .was_cancelled tells you whether cancellation has been requested, then they're likely to get an exception early and realize the error of their ways.)

But with the CancelScope implementation in master currently, there's a problem. We let the same cancel scope be entered multiple times, so it's hard to tell the difference between "this scope is done, it can tell you whether it's been cancelled", and "this scope is about to be entered, trying to check whether it's been cancelled is an error". In fact it can be in both of those states at the same time. And... I guess generally this discussion has made me realize that using the same CancelScope for multiple extents is a really complicated API design issue, and uncertain whether this limited form of it is a good idea or not. (E.g. if we do eventually add CancelSource, then it will be confusing to have two different ways of applying a single CancelSource to multiple pieces of code.)

So... I'm thinking for right now (i.e., before we make the next release!), I'd like to switch to saying that a CancelScope can only be used once, and can't be reused. (And of course we can always relax this later after we have more experience.) Any objection?

oremanj added a commit to oremanj/trio that referenced this issue Feb 7, 2019
To keep our options open for adding a smarter `was_cancelled` property, per discussion in python-trio#886.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants