Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe move_on_after(0)/fail_after(0) should immediately cancel their scope #320

Closed
njsmith opened this issue Sep 6, 2017 · 2 comments
Closed

Comments

@njsmith
Copy link
Member

njsmith commented Sep 6, 2017

It's tempting to think that you can fake a "check once, but don't block" semantics by setting up a 0-timeout cancel scope:

async def fake_accept_nowait(self):
    with fail_after(0):
        await self.accept()

This is not how Trio's cancellation semantics are supposed to work – there's no guarantee that accept won't raise Cancelled before it even attempts the operation.

@sorcio points out in chat that this actually works surprisingly well though. In fact, it probably succeeds deterministically right now (!), because fail_after(0) schedules its cancel scope to be cancelled "immediately", but since it uses a timeout to do this it doesn't actually happen until Trio processes timeouts, which it doesn't do until after this task yields. And if you look at trio._socket._SocketType.accept... it doesn't yield until after trying a non-blocking accept. So right now things work by accident.

This is dangerous, because this is a tempting thing to try, and the sort of thing where people might end up depending on it because it seems to work. I've been tempted and stopped b/c I thought it through and realized it wouldn't work – but we can't expect everyone to have my level of familiarity with the subtleties of Trio's semantics :-) – and @sorcio's tried it and found it worked...

So, if we want to keep the current semantics, maybe we should special-case move_on_after(0) and fail_after(0) to immediately cancel the scope they create, instead of going through the normal timeout machinery. Of course it would still be possible to get a similar effect with move_on_after(1e-12), but then at least it's obvious that you're doing something potentially race-y.

Alternatively, could we change our semantics to make this work? The obvious problem would be IOCP on Windows. If you use IOCP to implement accept, then the underlying operation is actually "hey Windows, please start an accept running in the background", and then blocking and going through the event loop to check if it's done. In practice, if an accept is ready immediately, then it'll probably complete either synchronously (IOCP operations are allowed to do that) or at least before we notice the timeout and call CancelIoEx. But it makes me nervous? I dunno, maybe this would actually be fine.

The bigger problem is that the semantics of "don't deliver cancellations until a task actually blocks rather than just checkpoints" are fundamentally problematic: if you have some heavy computational work to do in the background, then the standard way is to go ahead and do it and just make sure to checkpoint frequently so other tasks can run. If we start delaying cancellations in general, then it becomes impossible to cancel a task like this.

So if we were going to do this, it would have to a specific thing the user had to request, I think – and then it would be the user's job to make sure that you only use it for operations that either succeed or block quickly, like accept. with trio.move_on_when_blocks(): .... That's... not totally implausible. Certainly it'd be possible to implement, so long as we wrote down a clear definition of the difference between a checkpoint and blocking – which I'm a bit reluctant to do, because adding more distinctions adds complexity. Though we've already opened that door somewhat with trio.testing.wait_all_tasks_blocked… but that's in trio.testing plus it lets you specify a slop factor exactly to handle things like IOCP accept needing a few microseconds to be processed.

Probably the immediate thing to do is to add the special-case to make zero timeouts actually instantaneous, to keep our options open, and then continue to think about this.

Cross-ref: #242; in some sense this and that are alternatives, since they both potentially provide more general ways to handle XX_nowait.

@oremanj
Copy link
Member

oremanj commented Jan 10, 2019

If checkpoint() were implemented as await checkpoint_if_cancelled(); await cancel_shielded_checkpoint(), we could define "blocking" as "anything that calls wait_task_rescheduled". I think that would work for most cases that people want something like move_on_when_blocks(). (I haven't needed it myself - I'm speculating here.)

If we want to support a slop factor too, like wait_all_tasks_blocked does... well, I'm not sure I would say I recommend the following approach, but it was fun to try.

import math
import types
import outcome
import trio
from trio._core._traps import WaitTaskRescheduled
from async_generator import asynccontextmanager

@asynccontextmanager
async def move_on_when_blocks(cushion):
    this_task = trio.hazmat.current_task()

    @types.coroutine
    def wrapper(underlying_coro, cancel_scope):
        try:
            next_trap = "hello"
            while next_trap != "goodbye":
                if isinstance(next_trap, WaitTaskRescheduled):
                    cancel_scope.deadline = trio.current_time() + cushion
                trap_result = yield next_trap
                cancel_scope.deadline = math.inf
                next_trap = underlying_coro.send(trap_result)
        finally:
            this_task.coro = underlying_coro
            yield from trio.hazmat.cancel_shielded_checkpoint()

    @types.coroutine
    def get_me_off_this_ride():
        yield "goodbye"

    async def async_wrapper(underlying_coro, cancel_scope):
        return await wrapper(underlying_coro, cancel_scope)

    with trio.open_cancel_scope() as scope:
        this_task.coro = async_wrapper(this_task.coro, scope)
        assert this_task.coro.send(None) == "hello"
        try:
            await trio.hazmat.cancel_shielded_checkpoint()
            yield scope
        finally:
            await get_me_off_this_ride()


async def test(break_out):
    import random

    print(f"-- test with break_out={break_out!r} --")
    async with move_on_when_blocks(0.18 if break_out else 0.22) as scope:
        for _ in range(30):
            value = random.random() / 5
            print(f"sleeping for {value:.2f}sec")
            await trio.sleep(value)
        print("did not break out")
    if scope.cancelled_caught:
        print("broke out")
    assert scope.cancelled_caught == break_out
    print("look, I can sleep as long as I want now")
    await trio.sleep(1)

if __name__ == "__main__":
    trio.run(test, False)
    trio.run(test, True)
$ python test.py
-- test with break_out=False --
sleeping for 0.14sec
sleeping for 0.14sec
sleeping for 0.18sec
sleeping for 0.17sec
sleeping for 0.14sec
sleeping for 0.03sec
sleeping for 0.11sec
sleeping for 0.01sec
sleeping for 0.12sec
sleeping for 0.04sec
sleeping for 0.01sec
sleeping for 0.12sec
sleeping for 0.13sec
sleeping for 0.18sec
sleeping for 0.10sec
sleeping for 0.08sec
sleeping for 0.05sec
sleeping for 0.11sec
sleeping for 0.17sec
sleeping for 0.02sec
sleeping for 0.03sec
sleeping for 0.08sec
sleeping for 0.14sec
sleeping for 0.19sec
sleeping for 0.09sec
sleeping for 0.08sec
sleeping for 0.15sec
sleeping for 0.18sec
sleeping for 0.14sec
sleeping for 0.05sec
did not break out
look, I can sleep as long as I want now
-- test with break_out=True --
sleeping for 0.17sec
sleeping for 0.02sec
sleeping for 0.16sec
sleeping for 0.06sec
sleeping for 0.04sec
sleeping for 0.16sec
sleeping for 0.19sec
broke out
look, I can sleep as long as I want now

@njsmith
Copy link
Member Author

njsmith commented Jan 10, 2019

I suspect we'll end up promoting checkpoint to a first-class trap, as an optimization.

I think it is definitely not proven yet that move_on_when_blocks is a desirable feature :-).

oremanj added a commit to oremanj/trio that referenced this issue Feb 5, 2019
Fixes python-trio#320. Now that python-trio#901 is implemented, the concerns discussed in python-trio#835 don't apply.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants