Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unbound cancel scopes #835

Merged
merged 14 commits into from
Jan 28, 2019
Merged

Conversation

oremanj
Copy link
Member

@oremanj oremanj commented Jan 2, 2019

Add support for unbound cancel scopes as discussed in #607: the ability to create a CancelScope without entering it, so you can pass it to another task and retain a handle for cancelling some work in that other task.

This change exposes trio.CancelScope as a regular class that users can call to construct cancel scopes, which obviates the need for trio.open_cancel_scope. The latter is therefore deprecated.

@oremanj oremanj requested a review from njsmith January 2, 2019 05:03
@codecov
Copy link

codecov bot commented Jan 2, 2019

Codecov Report

Merging #835 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #835      +/-   ##
==========================================
+ Coverage   99.52%   99.53%   +<.01%     
==========================================
  Files         101      101              
  Lines       12307    12377      +70     
  Branches      906      910       +4     
==========================================
+ Hits        12249    12319      +70     
  Misses         36       36              
  Partials       22       22
Impacted Files Coverage Δ
trio/_core/_traps.py 100% <ø> (ø) ⬆️
trio/__init__.py 100% <ø> (ø) ⬆️
trio/_core/_exceptions.py 100% <ø> (ø) ⬆️
trio/_sync.py 100% <100%> (ø) ⬆️
trio/_core/tests/test_parking_lot.py 100% <100%> (ø) ⬆️
trio/_file_io.py 100% <100%> (ø) ⬆️
trio/testing/_mock_clock.py 100% <100%> (ø) ⬆️
trio/_subprocess.py 100% <100%> (ø) ⬆️
trio/_core/tests/test_run.py 100% <100%> (ø) ⬆️
trio/tests/test_socket.py 100% <100%> (ø) ⬆️
... and 10 more

@oremanj
Copy link
Member Author

oremanj commented Jan 2, 2019

The remaining reported coverage miss is on _run.py line 236, which is a comment. I can't reproduce locally - any idea what might be going on with that?

@oremanj
Copy link
Member Author

oremanj commented Jan 2, 2019

This diff mostly keeps the behavior of "deadlines don't result in calls to cancel() except on scopes that have at least one task attached to them". (It adds a bit of logic to also call cancel() on deadline expiry for scopes that have linked children, but that's unrelated to this question.) I'm not sure if keeping that behavior is necessarily a good thing, because it means that deadline expiries before the scope is entered still don't take effect immediately:

scope = trio.move_on_after(0.5)
await trio.sleep(1)
with scope:
    assert not scope.cancel_called
    await trio.hazmat.cancel_shielded_checkpoint()
    assert scope.cancel_called

I see three reasonable options here:

  • Keep the behavior of this diff as it stands.
  • Remove the logic that keeps cancel scopes out of Runner.deadlines if they have no tasks attached. This will require extra bookkeeping by the run loop to cancel scopes that have already completed successfully (because they're reusable, even if they usually won't get reused), but arguably provides more intuitive behavior if the scope's deadline expires before it's entered, and preserves the current behavior for things like with trio.move_on_after(0). This would technically still be a behavioral change in terms of when cancel_called gets set, but that's less likely to be user-visible than option 3.
  • Continue to keep cancel scopes out of Runner.deadlines if they have no tasks attached (and no linked children), but add logic to cancel the scope immediately upon entering its context if its deadline is in the past. This would make trio.move_on_after(0) immediately cancel its scope, which is a behavioral change, but one we've previously discussed being maybe desirable: Maybe move_on_after(0)/fail_after(0) should immediately cancel their scope #320

Thoughts?

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Finally read through this. Sorry for the delay!

And... sorry for the bother, but would be up for splitting out the linked_child part of this into a followup PR? The changes everywhere to handle unbound cancel scopes look mostly straightforward and fine but there are a lot of them to hold in my head, and the linked scopes part is quite complex and subtle and I have multiple concerns. Try to juggle both at once is difficult :-).

(Notes to self for the future PR: Not sure whether .linked_children should be all descendents, immediate children, or not be exposed at all. I think there's a serious bug in the use of weakrefs: if you have B = A.linked_child(); C = B.linked_child(); del B then now A.cancel() won't cancel B. Wouldn't it be better for the effective deadline to key off of whether ._linked_children is non-empty than whether it's ever been non-empty? I think the interaction of shields and linked children needs a serious rethink... in fact maybe in between the two changes we should split off shielding to a different object, as suggested in the #607 thread.)

trio/_core/_run.py Outdated Show resolved Hide resolved
@@ -199,8 +364,7 @@ def _add_task(self, task):
task._cancel_stack.append(self)

def _remove_task(self, task):
with self._might_change_effective_deadline():
self._tasks.remove(task)
self._tasks.remove(task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effective deadline can only change by removing a task if the task being removed is the last one in the cancel scope. That will always be the _scope_task, and it will be removed in _close. So I moved the might_change_effective_deadline to _close and removed it here, which saves a bit of overhead when adding and removing other tasks. It also more closely parallels the new might_change_effective_deadline in __enter__.

trio/_core/_run.py Outdated Show resolved Hide resolved
trio/_core/_run.py Outdated Show resolved Hide resolved
if now >= self.deadline:
state = ", cancel soon"
else:
state = ", cancel in {:.2f}sec".format(self.deadline - now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these might be clearer if we collapse them into , deadline is {:.2f} seconds from now? I get what you mean by "soon" but it's using it in a technical sense that not everyone may understand, and using the word "deadline" connects it back to the deadline= kwarg and .deadline attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is cleaner. I implemented a small variation: "deadline is X seconds from now" or "deadline is Y seconds ago".

@smurfix
Copy link
Contributor

smurfix commented Jan 17, 2019

I think there's a serious bug in the use of weakrefs: if you have B = A.linked_child(); C = B.linked_child(); del B then now A.cancel() won't cancel B.

The way to fix this is to have a non-weak .link_parent entry which links C back to B back to A and thus keeps B alive until C's refcount is zero.

@njsmith
Copy link
Member

njsmith commented Jan 17, 2019

Oh another note for the future linked scopes PR: updating current_effective_deadline

@njsmith
Copy link
Member

njsmith commented Jan 17, 2019

Oh, and regarding the issue of whether entering a scope with a deadline in the past should trigger an immediate cancellation:

I think there are arguments either way. But, there is also a nasty little subtlety that makes changing this more complicated than you'd expect. Consider:

with move_on_after(0.5):
    # Then we do other stuff for a while, without checkpointing
    time.sleep(1)
    # Then we checkpoint, which we expect to cause the whole block to be aborted, because the
    # outer scope's deadline has expired
    await checkpoint()
    print("this should not be printed!")

Right now, the text is not printed. But if we made it so we checked deadlines when entering a cancel scope, it would be (!). That's because checkpoint is defined as:

with CancelScope(deadline=-inf):
    await sleep_forever()

Right now, deadlines are all checked at the same time, in the main run loop, and we do some elaborate tricks to make sure that if two cancel scopes expire at the same time, then it's the outer one that's cancelled.

If we changed CancelScope.__enter__ to check the deadline immediately, that would break this batching... so checkpoint would effectively become

with CancelScope() as cs:
    cs.cancel()
    await sleep_forever()

and now in our original example this scope will cancel the sleep first, before the outer scope, so the message will be printed.

Anyway.... this is all a bit of a mess. I don't think we should try messing with it here in this PR. I suspect we should simplify it somewhat in any case. I'll open an issue for that.

@oremanj
Copy link
Member Author

oremanj commented Jan 19, 2019

Thanks for the commentary! I'll do this split and respond in more detail in a bit.

But first, I was looking at this in the context of #860 and noticed an additional subtlety:

>>> from trio import move_on_after
>>> from trio.hazmat import checkpoint
>>> import time
>>> with move_on_after(0.5):
...     # Then we do other stuff for a while, without checkpointing
...     time.sleep(1)
...     # Then we checkpoint, which we expect to cause the whole block to be aborted, because the
...     # outer scope's deadline has expired
...     await checkpoint()
...     print("this should not be printed!")
... 
>>> with move_on_after(0.5):
...     time.sleep(1)
...     await trio.hazmat.checkpoint_if_cancelled()
...     await trio.hazmat.cancel_shielded_checkpoint()
...     print("but should this?")
... 
but should this?
>>> 

And this isn't just an academic distinction - there's lots of code in trio that uses checkpoint_if_cancelled followed by cancel_shielded_checkpoint to do a checkpoint when it knows it doesn't need to block indefinitely. It's already documented that deadline expiry might not be noticed immediately. So, I'm not sure that "this should not be printed" getting printed in the above example is actually as bad as it looks?

If we don't want either print statement in the above examples to execute, I'd suggest:

  • have checkpoint_if_cancelled() check whether the current effective deadline is less than the current time, rather than whether there's a pending cancel scope
  • implement checkpoint() in terms of the two half-checkpoint operations, or as its own trap, rather than yielding inside a scope with a -inf deadline

@oremanj oremanj changed the title Add support for unbound and linked cancel scopes Add support for unbound cancel scopes Jan 19, 2019
@oremanj
Copy link
Member Author

oremanj commented Jan 19, 2019

I've removed the linked-child changes from this PR, and will repost them as a new PR once this one gets resolved.

I think the interaction of shields and linked children needs a serious rethink... in fact maybe in between the two changes we should split off shielding to a different object, as suggested in the #607 thread.

The more I play with cancellation, the more I'm feeling like the ability to modify shielding dynamically is an extremely powerful and useful feature that I don't want to give up. :-) If we drop it, I think we lose the ability to prototype a number of changes to cancellation semantics outside of _core. For example, I used both dynamic shielding and linked children in the "cancel nursery body before the rest of the children" implementation in #147 (comment). It might be possible to get the same semantics without dynamic shielding changes, but I think it would be a lot more cumbersome.

@njsmith
Copy link
Member

njsmith commented Jan 20, 2019

Yeah, the discussion of cancel-ordering in #147 made me think dynamic shielding may be important, too! But it could be dynamic while still being a separate object, and anyway, this PR probably isn't the best place to talk about it because we'll lose track once it's merged :-)

@python-trio python-trio deleted a comment from codecov bot Jan 22, 2019
@oremanj oremanj closed this Jan 22, 2019
@oremanj oremanj reopened this Jan 22, 2019
Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to this one :-). Inevitably, re-reading the more focused PR found a few more small niggles, but I think we can merge once these are addressed!

trio/_core/tests/test_run.py Outdated Show resolved Hide resolved
docs/source/reference-core.rst Show resolved Hide resolved
@njsmith
Copy link
Member

njsmith commented Jan 28, 2019

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants